Introduction

This document is updated from analyze_temp_serial_transfer_expt--28Oct24.Rmd.

There are 5 treatments: no heat (5 days serial transfer), 6h heat, 12h heat, 24h heat, and 48h heat. Each of these is setup with 5 technical replicates that was initially inoculated to about equal ratios (on Day 0).

Summary of choices: Based on the number of cells observed in true blank wells, I only include data from wells with >50 cells. Based on the flow cytometry data from Day 0 (which is estimated from the blanks in the OD data to have never experienced any contamination events), a misclassification rate of 1% is assumed. Contaminated replicates are defined as having a substantially higher % of a species that was not inoculated in that well (i.e., than expected from this misclassification rate). When a contaminated well was detected, all time points associated with that well were removed from the data.

Load & Annotate Data

After loading the environment, I will load all of the flow cytometry cell count data and information about extinct wells from the OD data. Data is also annotated.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl) # for importing data directly from Excel sheet
library(RColorBrewer) # for changing the colours of plots
library(ggbeeswarm) # for beeswarm plots
library(vegan) # to estimate diversity and for ordination (NMDS)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.6-8
library(ggordiplots) # for ggplotting ellipses around treatment group centroids during ordination
## Loading required package: glue
library(chemodiv) # for estimating species richness
#library(lme4)  <-- not sure this is needed?
library(glmmTMB) # for fitting and trouble-shooting GLM's
library(DHARMa) # for plotting the residuals when using glmmTMB
## This is DHARMa 0.4.7. For overview type '?DHARMa'. For recent changes, type news(package = 'DHARMa')
library(rcompanion) # for r-squared estimates of GLM's
library(MuMIn) # for calculating AICc
library(performance) # for checking multicolinearity
library(effsize) # for post-hoc estimate of effect sizes
library(emmeans) # (ditto as above)
## Welcome to emmeans.
## Caution: You lose important information if you filter this package's results.
## See '? untidy'
library(BSDA) # for pairwise t-tests to compare effect sizes between data subsets
## 
## Attaching package: 'BSDA'
## 
## The following object is masked from 'package:datasets':
## 
##     Orange
#library(partitionBEFsp) # for paritioning the biodiversity effects
#library(ape) # for ordination ??
library(ggforce) # for plotting ellipses in ggplot

# print the complete info about packages and versions currently loaded in the environment:
sessionInfo()
## R version 4.4.2 (2024-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 22631)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United Kingdom.utf8 
## [2] LC_CTYPE=English_United Kingdom.utf8   
## [3] LC_MONETARY=English_United Kingdom.utf8
## [4] LC_NUMERIC=C                           
## [5] LC_TIME=English_United Kingdom.utf8    
## 
## time zone: Europe/Paris
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] ggforce_0.4.2      BSDA_1.2.2         emmeans_1.10.5     effsize_0.8.1     
##  [5] performance_0.12.4 MuMIn_1.48.4       rcompanion_2.4.36  DHARMa_0.4.7      
##  [9] glmmTMB_1.1.11     chemodiv_0.3.0     ggordiplots_0.4.3  glue_1.8.0        
## [13] vegan_2.6-8        lattice_0.22-6     permute_0.9-7      ggbeeswarm_0.7.2  
## [17] RColorBrewer_1.1-3 readxl_1.4.3       lubridate_1.9.3    forcats_1.0.0     
## [21] stringr_1.5.1      dplyr_1.1.4        purrr_1.0.2        readr_2.1.5       
## [25] tidyr_1.3.1        tibble_3.2.1       ggplot2_3.5.1      tidyverse_2.0.0   
## 
## loaded via a namespace (and not attached):
##  [1] Rdpack_2.6.2        gld_2.6.6           sandwich_3.1-1     
##  [4] rlang_1.1.4         magrittr_2.0.3      multcomp_1.4-26    
##  [7] matrixStats_1.4.1   e1071_1.7-16        compiler_4.4.2     
## [10] mgcv_1.9-1          vctrs_0.6.5         pkgconfig_2.0.3    
## [13] fastmap_1.2.0       rmarkdown_2.29      tzdb_0.4.0         
## [16] haven_2.5.4         nloptr_2.1.1        xfun_0.49          
## [19] modeltools_0.2-23   cachem_1.1.0        jsonlite_1.8.9     
## [22] tweenr_2.0.3        parallel_4.4.2      cluster_2.1.6      
## [25] DescTools_0.99.58   R6_2.5.1            coin_1.4-3         
## [28] bslib_0.8.0         stringi_1.8.4       boot_1.3-31        
## [31] lmtest_0.9-40       jquerylib_0.1.4     cellranger_1.1.0   
## [34] numDeriv_2016.8-1.1 estimability_1.5.1  Rcpp_1.0.13-1      
## [37] knitr_1.49          zoo_1.8-12          Matrix_1.7-1       
## [40] splines_4.4.2       timechange_0.3.0    tidyselect_1.2.1   
## [43] rstudioapi_0.17.1   yaml_2.3.10         TMB_1.9.15         
## [46] codetools_0.2-20    plyr_1.8.9          withr_3.0.2        
## [49] coda_0.19-4.1       evaluate_1.0.1      survival_3.7-0     
## [52] polyclip_1.10-7     proxy_0.4-27        pillar_1.10.1      
## [55] nortest_1.0-4       stats4_4.4.2        insight_1.0.0      
## [58] reformulas_0.4.0    generics_0.1.3      hms_1.1.3          
## [61] munsell_0.5.1       scales_1.3.0        rootSolve_1.8.2.4  
## [64] minqa_1.2.8         xtable_1.8-4        class_7.3-22       
## [67] lmom_3.2            tools_4.4.2         data.table_1.16.2  
## [70] lme4_1.1-35.5       Exact_3.3           mvtnorm_1.3-2      
## [73] grid_4.4.2          rbibutils_2.3       libcoin_1.0-10     
## [76] colorspace_2.1-1    nlme_3.1-166        beeswarm_0.4.0     
## [79] vipor_0.4.7         cli_3.6.3           expm_1.0-0         
## [82] gtable_0.3.6        sass_0.4.9          digest_0.6.37      
## [85] TH.data_1.1-2       farver_2.1.2        htmltools_0.5.8.1  
## [88] lifecycle_1.0.4     httr_1.4.7          multcompView_0.1-10
## [91] MASS_7.3-61
# set theme for all plots
fave_theme <- theme_light() + # see other options at https://ggplot2.tidyverse.org/reference/ggtheme.html
              theme(text = element_text(size=15), # larger text size for titles & axes
                    panel.grid.major = element_blank(), # remove major gridlines
                    panel.grid.minor = element_blank()) # remove minor gridlines
theme_set(fave_theme)

# define a palette for plotting the 4 species
species_4pal_alphabetical = palette.colors(8, palette = "R4")[c(3, 5, 7, 2)] #in alphabetical order
species_4pal_speed = palette.colors(8, palette = "R4")[c(7, 5, 3, 2)] #from fast to slow

# define a palette for plotting the 3 treatment days
trtmt_pal = brewer.pal(4, "Set2")[c(4, 3, 1)]

# define a palette for plotting the heat duration
control_to_48h_pal <- scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9)

# define a palette for plotting the inoculated community richness
CommRich_pal <- scale_colour_viridis_d(option = "viridis", begin=0.2, end=0.95)

# define a function to find the mode of a vector. Credit to https://stackoverflow.com/questions/2547402/how-to-find-the-statistical-mode
Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

Load the cell counts data from the summary Excel files created by FCS Express. Then load the well volume data from the summary .csv files created by the Attune software.

# a function to load the fluorescent counts data (from .xlsx files created by )
import_flow_counts <- function(file)
  return(as.data.frame(
          read_excel(path=file, sheet="FCS Express Report",
                    # replace the column names as follows:
                    col_names = c("Filename",
                                  "Gate1", "Count_grimontii",
                                  "Gate2", "Count_putida",
                                  "Gate3", "Count_protegens",
                                  "Gate4","Count_veronii")))
  )
# a function to load and parse the volume data
import_flow_volume <- function(file) {
  raw.csv <- read.csv(file)
  # keep the volume info and just enough data to identify the sample. Then remove resultant redundant rows
  vol_data <- raw.csv %>% select(Plate, Sample, Volume) %>% unique()
}

# a function to loop through the folders containing the data files, open the .xlsx and .csv files and combine their data
import_from_files <- function(dir_vector){
  # initiatize variables
  raw_counts <- raw_vols <- data.frame()
  # loop through each directory
  for(dir in dir_vector){
    # get all the file names
    files_v <- list.files(dir)
    
    # identify the excel files
    files_excel <- files_v[endsWith(files_v, ".xlsx")]
    # and loop through all of them to extract their data
    TMPraw_counts <- data.frame()
    for(val in files_excel){
      TMPraw_counts <- rbind(TMPraw_counts, import_flow_counts(paste0(dir, "/", val)))
    }
    
    # identify the csv files
    files_csv <- files_v[endsWith(files_v, ".csv")]
        # and loop through all of them to extract their data
    TMPraw_vols <- data.frame()
    for(val in files_csv){
      TMPraw_vols <- rbind(TMPraw_vols, import_flow_volume(paste0(dir, "/", val)))
    }
    
    # concatenate the data from counts and from vols
    raw_counts <- rbind(raw_counts, TMPraw_counts)
    raw_vols <- rbind(raw_vols, TMPraw_vols)
    rm(TMPraw_counts, TMPraw_vols)
  }
  return(list(raw_counts, raw_vols))
}

# get all of the raw data:
list_rawdata <- import_from_files(c("./raw_data/serial_transf--2July24", "./raw_data/serial_transf--8July24", "./raw_data/serial_transf--5Aug24", "./raw_data/serial_transf--19Aug24"))

Now we can process the data to create unique ID’s for each sample. This info needs to be parsed from the Filename column for the flow counts data (i.e., excel files) and from the Plate column for the flow volumes data (i.e., csv files).

# start with flow counts data:
  # I got confused and now there are rows containing the column names. Get rid of those...
list_rawdata[[1]] <- list_rawdata[[1]][-grep("Filename", list_rawdata[[1]]$Filename),]
  # Day0 has a different pattern in the Filename column so let's process those rows first
Day0 <- list_rawdata[[1]][grep("Day0", list_rawdata[[1]]$Filename),] %>% separate_wider_regex(Filename,
                                                                                              c(Date="24-0\\d-\\d{2}", " Day", Day="\\d",
                                                                                                ".*dilution_", Well="\\w\\d+", "\\.acs compensated"))
  # Now process the Filename column for the other days
NOTday0 <- list_rawdata[[1]][-grep("Day0", list_rawdata[[1]]$Filename),] %>% separate_wider_regex(Filename,
                                                                                              c(Date="24-0\\d-\\d{2}", " Day", Day="\\d", " -- ",
                                                                                                Incubator="\\w+", "\\.plate", Plate="\\d", 
                                                                                                ".*dilution_", Well="\\w\\d+", "\\.acs compensated"))
# Put the flow counts data back together into a single data.frame:
raw_Counts <- rbind(Day0 %>% mutate(Incubator=NA, Plate=0), # add in the 2 extra empty columns that are missing from Day0
                    NOTday0) %>% select(-Gate1, -Gate2, -Gate3, -Gate4)
rm(Day0, NOTday0)


# then do a similar thing for the volume data:
  # Day0 has a different pattern in the Plate column so let's process those rows first
Day0 <- list_rawdata[[2]][grep("Day0", list_rawdata[[2]]$Plate),] %>% separate_wider_regex(Plate,
                                                                                           c(Date="24-0\\d-\\d{2}", " Day", Day="\\d", ".*"))
  # Now process the Plate column for the other days
NOTday0 <- list_rawdata[[2]][-grep("Day0", list_rawdata[[2]]$Plate),] %>% separate_wider_regex(Plate,
                                                                                              c(Date="24-0\\d-\\d{2}", " Day", Day="\\d", " -- ",
                                                                                                Incubator="\\w+", "\\.plate", Plate="\\d"))
# Put the flow volumes data back together into a single data.frame:
raw_Vol <- rbind(Day0 %>% mutate(Incubator=NA, Plate=0), # add in the 2 extra empty columns that are missing from Day0
                 NOTday0) %>% rename(Well = Sample) # rename this column for consistency with the Counts data
rm(Day0, NOTday0)

# We can now combine the counts and volume data
  # here I need to use left join because we don't have volume data for Day 0 on 24-07-02 !!!!!!
raw_data <- left_join(raw_Counts, raw_Vol,
                      by=c("Date", "Day", "Well", "Incubator", "Plate"))
rm(raw_Counts, raw_Vol)


# add annotation specifying the Heat treatment and the Incubator
  # For 2July24: all samples were subjected to 6h of heat
  # For 8July24: samples in the Epoch plate reader are control (no heat)
  #              Samples in the H1 plate reader are 48h of heat
  # For 5Aug24: all samples were subjected to 12h of heat
  # For 19Aug24: all samples were subjected to 24h of heat
raw_data$Heat <- 0
raw_data$Heat[which(raw_data$Date == "24-07-02")] <- 6
raw_data$Heat[which(raw_data$Date == "24-07-08" & raw_data$Incubator == "H1")] <- 48
raw_data$Heat[which(raw_data$Date == "24-08-05")] <- 12
raw_data$Heat[which(raw_data$Date == "24-08-19")] <- 24

# change the variable classes for data analysis
raw_data$Count_grimontii <- as.numeric(raw_data$Count_grimontii)
raw_data$Count_putida <- as.numeric(raw_data$Count_putida)
raw_data$Count_protegens <- as.numeric(raw_data$Count_protegens)
raw_data$Count_veronii <- as.numeric(raw_data$Count_veronii)

Finally, we can annotate the data with the sample information for each well. Note that there are different plate layouts for Day0 (same for all dates) And the experiment from 24-07-02 uses a different layout as compared to the rest of the data (see the layout png file in the corresponding data subfolder). …But, also, I made other mistakes too so there’s modified layouts for that too! XP

# the "Plate1" layout is used for all days >0 (except for 24-07-02)
layout.plate1 <- data.frame(Well = paste0(LETTERS[1:8], rep((2*1:6)-1, each=8)),
                            putida = c(0, 1, 1, 1, 0, 0, 0, 0,
                                       1, 0, 0, 0, 1, 1, 1, 0,
                                       0, 0, 1, 1, 1, 0, 1, 0,
                                       0, 0, 1, 1, 1, 0, 1, 0,
                                       1, 1, 1, 0, 1, 0, 0, 0,
                                           0, rep(0,6), 1),
                            protegens = c(0, 0, 0, 1, 0, 0, 1, 0,
                                          0, 1, 0, 0, 1, 0, 0, 1,
                                          1, 0, 1, 1, 0, 1, 1, 0,
                                          0, 0, 1, 0, 0, 1, 0, 1,
                                          1, 1, 0, 1, 1, 0, 1, 0,
                                              1, rep(0,6), 0),
                            grimontii =  c(0, 0, 1, 0, 0, 1, 0, 0,
                                           0, 0, 1, 0, 0, 1, 0, 1,
                                           0, 1, 1, 0, 1, 1, 1, 0,
                                           1, 0, 0, 1, 0, 1, 0, 0,
                                           1, 0, 1, 1, 1, 0, 0, 1,
                                               1, rep(0,6), 0),
                            veronii =    c(0, 1, 0, 0, 1, 0, 0, 0,
                                           0, 0, 0, 1, 0, 0, 1, 0,
                                           1, 1, 0, 1, 1, 1, 1, 0,
                                           0, 1, 0, 0, 1, 0, 0, 0,
                                           0, 1, 1, 1, 1, 0, 1, 1,
                                               0, rep(0,6), 0))

### CommRich = 0 corresponds to blanks, mistakes made on Day0 are removed altogether,
###     and CommRich = NA is used to indicate contamination.
# modified layout of plate1 specific for 24-07-02
layout.plate1_2Jul <- layout.plate1
layout.plate1_2Jul$putida[c(1,8, 41:48)]    <- c(0, 1, 1, 1, 1, 0, 1, 0, 0, 0) 
layout.plate1_2Jul$protegens[c(1,8, 41:48)] <- c(1, 0, 1, 0, 0, 1, 0, 1, 0, 0)
layout.plate1_2Jul$grimontii[c(1,8, 41:48)] <- c(1, 0, 0, 1, 0, 1, 0, 0, 1, 0)
layout.plate1_2Jul$veronii[c(1,8, 41:48)]   <- c(0, 0, 0, 0, 1, 0, 0, 0, 0, 1)

# modified layout of plate1 specific for mistakes made on 24-07-08
  # column 4 of OD plate is swapped orientation
layout.plate1_8Jul <- layout.plate1
layout.plate1_8Jul$putida[25:32]    <- layout.plate1$putida[9:16]
layout.plate1_8Jul$protegens[25:32] <- layout.plate1$protegens[9:16]
layout.plate1_8Jul$grimontii[25:32] <- layout.plate1$grimontii[9:16]
layout.plate1_8Jul$veronii[25:32]   <- layout.plate1$veronii[9:16]

# add a column for community richness in all of the above df's
layout.plate1 <- layout.plate1 %>% mutate(CommRich = putida+protegens+grimontii+veronii , .keep="all")
layout.plate1_2Jul <- layout.plate1_2Jul %>% mutate(CommRich = putida+protegens+grimontii+veronii , .keep="all")
layout.plate1_8Jul <- layout.plate1_8Jul %>% mutate(CommRich = putida+protegens+grimontii+veronii , .keep="all")


# the "Plate2" layout
layout.plate2 <- data.frame(Well = paste0(LETTERS[1:8], rep((2*1:6)-1, each=8)),
                            putida = c(1, 1, 1, 0, 1, 0, 0, 0,
                                       1, 0, 1, 0, 0, 0, 1, 1,
                                       1, 0, 1, 0, 0, 0, 1, 1,
                                       1, 0, 0, 0, 1, 1, 1, 0,
                                           rep(0,7), 0,
                                       0, 1, 0, 1, 1, 1, 0, 0),
                            protegens = c(1, 0, 0, 1, 0, 1, 0, 0,
                                          0, 1, 1, 0, 1, 0, 1, 1,
                                          0, 1, 0, 1, 0, 0, 1, 0,
                                          1, 0, 1, 0, 1, 1, 0, 1,
                                              rep(0,7), 1,
                                          0, 1, 1, 0, 1, 1, 0, 0),
                            grimontii = c(0, 1, 0, 1, 0, 0, 1, 0,
                                          1, 1, 1, 0, 0, 1, 1, 0,
                                          0, 1, 0, 0, 1, 0, 0, 1,
                                          1, 0, 0, 1, 1, 0, 1, 1,
                                             rep(0,7), 0,
                                          0, 1, 1, 1, 0, 1, 1, 0),
                            veronii = c(0, 0, 1, 0, 0, 0, 0, 1,
                                        1, 1, 1, 0, 1, 1, 0, 1,
                                        1, 0, 0, 0, 0, 1, 0, 0, 
                                        1, 0, 1, 1, 0, 1, 1, 1,
                                           rep(0,7), 1,
                                        0, 1, 1, 1, 1, 0, 1, 0))

# modified layout of plate2 specific for 24-07-02
layout.plate2_2Jul <- layout.plate2
layout.plate2_2Jul$putida[1:32]    <- layout.plate2$putida[c(9:32,41:47,40)]
layout.plate2_2Jul$protegens[1:32] <- layout.plate2$protegens[c(9:32,41:47,40)]
layout.plate2_2Jul$grimontii[1:32] <- layout.plate2$grimontii[c(9:32,41:47,40)]
layout.plate2_2Jul$veronii[1:32]   <- layout.plate2$veronii[c(9:32,41:47,40)]
layout.plate2_2Jul <- layout.plate2_2Jul[1:32,] # rest of flow plate 2 is empty

# modified layout of plate2 specific for mistakes made on 24-07-08 and 24-08-19
layout.plate2_8Jul19Aug <- layout.plate2[-(9:16),] # I screwed up column 8 of OD plate

# add a column for community richness in all of the above df's
layout.plate2 <- layout.plate2 %>% mutate(CommRich = putida+protegens+grimontii+veronii , .keep="all")
layout.plate2_2Jul <- layout.plate2_2Jul %>% mutate(CommRich = putida+protegens+grimontii+veronii , .keep="all")
layout.plate2_8Jul19Aug <- layout.plate2_8Jul19Aug %>% mutate(CommRich = putida+protegens+grimontii+veronii , .keep="all")

# the "Inocula" layout
layout.inocula <- data.frame(Well = paste0(LETTERS[1:8], rep((2*1:6)-1, each=8)),
                               putida = rep(c(1, 0, 0, 0, 1, 1, 1, 0,
                                              0, 0, 1, 1, 1, 0, 1, NA), times=3),
                            protegens = rep(c(0, 1, 0, 0, 1, 0, 0, 1,
                                              1, 0, 1, 1, 0, 1, 1, NA), times=3),
                            grimontii = rep(c(0, 0, 1, 0, 0, 1, 0, 1,
                                              0, 1, 1, 0, 1, 1, 1, NA), times=3),
                              veronii = rep(c(0, 0, 0, 1, 0, 0, 1, 0,
                                              1, 1, 0, 1, 1, 1, 1, NA), times=3)) %>%
                mutate(CommRich = putida+protegens+grimontii+veronii , .keep="all") %>%
                    filter(!is.na(CommRich))


# a function to annotate each data set with the indicated layout
  # this will KEEP well blanks!
annotate_samples <- function(layout, select_date, select_plate) {
  relevant_data <- raw_data %>% filter(Date==select_date, Plate==select_plate)
  
  # for Innoc, use inner_join to combine the flow data with its annotation
  if(select_plate == 0){
    output_df <- inner_join(layout, relevant_data, by="Well")
  }
  if(select_plate != 0) {
    output_df <- left_join(merge(layout, relevant_data %>% select(Day, Incubator, Heat) %>% distinct()),
                         relevant_data, by=c("Well", "Day", "Incubator", "Heat"))
    output_df$Date <- select_date
    output_df$Plate <- select_plate
  }

  return(output_df)
  rm(relevant_data, output_df)#, blank_annot, blank_data)
}

# now we can add the sample names for each one.
annotated.rawdata <- rbind(annotate_samples(layout = layout.inocula, select_date = "24-07-02", select_plate=0),
                  annotate_samples(layout = layout.plate1_2Jul, select_date = "24-07-02", select_plate=1),
                  annotate_samples(layout = layout.plate2_2Jul, select_date = "24-07-02", select_plate=2),
                  annotate_samples(layout = layout.inocula, select_date = "24-07-08", select_plate=0),
                  annotate_samples(layout = layout.plate1_8Jul, select_date = "24-07-08", select_plate=1),
                  annotate_samples(layout = layout.plate2_8Jul19Aug, select_date = "24-07-08", select_plate=2), ##
                  annotate_samples(layout = layout.inocula, select_date = "24-08-05", select_plate=0),
                  annotate_samples(layout = layout.plate1, select_date = "24-08-05", select_plate=1),
                  annotate_samples(layout = layout.plate2, select_date = "24-08-05", select_plate=2),
                  annotate_samples(layout = layout.inocula, select_date = "24-08-19", select_plate=0),
                  annotate_samples(layout = layout.plate1, select_date = "24-08-19", select_plate=1),
                  annotate_samples(layout = layout.plate2_8Jul19Aug, select_date = "24-08-19", select_plate=2))

# fixing other small mistakes in annotation:
  # Day1 of 24-07-02: sample A1 from plate 2 was loaded into sample A1 plate 1.
annotated.rawdata$CommRich[which(annotated.rawdata$Date=="24-07-02" & annotated.rawdata$Day=="1" &
                                   annotated.rawdata$Well=="A1" & annotated.rawdata$Plate=="1")] <- 3

annotated.rawdata$putida[which(annotated.rawdata$Date=="24-07-02" & annotated.rawdata$Day=="1" &
                                 annotated.rawdata$Well=="A1" & annotated.rawdata$Plate=="1")] <- 1

annotated.rawdata$protegens[which(annotated.rawdata$Date=="24-07-02" & annotated.rawdata$Day=="1" &
                                    annotated.rawdata$Well=="A1" & annotated.rawdata$Plate=="1")] <- 0

annotated.rawdata$grimontii[which(annotated.rawdata$Date=="24-07-02" & annotated.rawdata$Day=="1" &
                                    annotated.rawdata$Well=="A1" & annotated.rawdata$Plate=="1")] <- 1

annotated.rawdata$veronii[which(annotated.rawdata$Date=="24-07-02" & annotated.rawdata$Day=="1" &
                                  annotated.rawdata$Well=="A1" & annotated.rawdata$Plate=="1")] <- 1


# Annotate the treatments
annotated.rawdata$Heat_Day <- as.numeric(NA)
annotated.rawdata$Heat_Day[which(annotated.rawdata$Heat!=0 & annotated.rawdata$Day==1)] <- 1
annotated.rawdata$Heat_Day[which(annotated.rawdata$Heat>6 & annotated.rawdata$Day==2)] <- 2
annotated.rawdata$Heat_Day[which(annotated.rawdata$Heat==48 & annotated.rawdata$Day==3)] <- 3

annotated.rawdata$Recov_Day <- as.numeric(NA)
annotated.rawdata$Recov_Day[which(annotated.rawdata$Heat==6 & annotated.rawdata$Day==2)] <- 1
annotated.rawdata$Recov_Day[which(annotated.rawdata$Heat==6 & annotated.rawdata$Day==3)] <- 2
annotated.rawdata$Recov_Day[which(annotated.rawdata$Heat %in% c(12,24) & annotated.rawdata$Day==3)] <- 1
annotated.rawdata$Recov_Day[which(annotated.rawdata$Heat %in% c(12,24) & annotated.rawdata$Day==4)] <- 2
annotated.rawdata$Recov_Day[which(annotated.rawdata$Heat==48 & annotated.rawdata$Day==4)] <- 1
annotated.rawdata$Recov_Day[which(annotated.rawdata$Heat==48 & annotated.rawdata$Day==5)] <- 2

# sanity check to make sure there are no redundant rows
stopifnot(!any(duplicated(annotated.rawdata %>% select(Date, Day, Incubator, Plate, Well))))

# change some of the values to more appropriate types
annotated.rawdata$Plate <- as.numeric(annotated.rawdata$Plate)
annotated.rawdata$Day <- as.numeric(annotated.rawdata$Day)

# clean up
rm(layout.inocula, layout.plate1, layout.plate1_2Jul, layout.plate1_8Jul, layout.plate2, layout.plate2_2Jul, layout.plate2_8Jul19Aug, list_rawdata, raw_data)

The annotated data contains information on the complete dataset, including blank wells and excluded wells. Any mistakes there were made during inoculation on Day 0 have been removed altogether. CommRich == 0 indicates well blanks that should be empty (in this case, all 4 species columns will also be 0). Finally, CommRich == NA indicates data rows that were excluded; e.g., due to low total counts or contamination (in this case, the 4 species columns will be kept to indicate what should have been in that excluded well).

For reproducibility and checking that the metadata is correctly associated with the data, print the metadata out to file. Note that the location on the incubated plates (corresponding to OD data) is different from the location on the flow cytometery plate. In the code below I create a column for the OD_Well and assign unique identifiers for each time series. The metadata file annotation_for_alldata.csv summarizes all of this info.

# annotation for Day 0 lists the plate as plate 0 but let's change that to Innoc
annotated.rawdata$Plate[which(annotated.rawdata$Plate==0)] <- "Innoc"

# copy the metadata to another variable and remove the data columns
metadata <- annotated.rawdata %>% select(-Volume,
                                         -Count_grimontii, -Count_protegens, -Count_putida, -Count_veronii)

# the columns currently labeled as "Well" and "plate" is actually only true for the location of the sample on the flow cytometer data
metadata$filler <- "plate"
metadata <- metadata %>% unite(col="plateNum", c(filler, Plate), sep="", remove = FALSE) %>% 
              unite(col="FLOWplateWell", c(plateNum, Well), sep="-", remove = FALSE) %>% select(-filler, -plateNum)

#####
# add true well sample location to metadata (i.e., as corresponding to OD data)
#####
# split up the Well into separate columns for the row and column location
metadata <- metadata %>% separate_wider_regex(Well, c(row="\\w", col="\\d+"))
metadata$REALcol <- 0

# for non-Innoc days after 2 July, the pattern is actually very simple and systematic
metadata$REALcol[which(metadata$Plate==1 & metadata$Date > "24-07-02" & metadata$col==1)] <- 1
metadata$REALcol[which(metadata$Plate==1 & metadata$Date > "24-07-02" & metadata$col==3)] <- 2
metadata$REALcol[which(metadata$Plate==1 & metadata$Date > "24-07-02" & metadata$col==5)] <- 3
metadata$REALcol[which(metadata$Plate==1 & metadata$Date > "24-07-02" & metadata$col==7)] <- 4
metadata$REALcol[which(metadata$Plate==1 & metadata$Date > "24-07-02" & metadata$col==9)] <- 5
metadata$REALcol[which(metadata$Plate==1 & metadata$Date > "24-07-02" & metadata$col==11)] <- 6
metadata$REALcol[which(metadata$Plate==2 & metadata$Date > "24-07-02" & metadata$col==1)] <- 7
metadata$REALcol[which(metadata$Plate==2 & metadata$Date > "24-07-02" & metadata$col==3)] <- 8
metadata$REALcol[which(metadata$Plate==2 & metadata$Date > "24-07-02" & metadata$col==5)] <- 9
metadata$REALcol[which(metadata$Plate==2 & metadata$Date > "24-07-02" & metadata$col==7)] <- 10
metadata$REALcol[which(metadata$Plate==2 & metadata$Date > "24-07-02" & metadata$col==9)] <- 11
metadata$REALcol[which(metadata$Plate==2 & metadata$Date > "24-07-02" & metadata$col==11)] <- 12

# for non-Innoc days on 2 July, the pattern is similar for plate 1 columns 1 to 9:
metadata$REALcol[which(metadata$Plate==1 & metadata$Date == "24-07-02" & metadata$col==1)] <- 1
metadata$REALcol[which(metadata$Plate==1 & metadata$Date == "24-07-02" & metadata$col==3)] <- 2
metadata$REALcol[which(metadata$Plate==1 & metadata$Date == "24-07-02" & metadata$col==5)] <- 3
metadata$REALcol[which(metadata$Plate==1 & metadata$Date == "24-07-02" & metadata$col==7)] <- 4
metadata$REALcol[which(metadata$Plate==1 & metadata$Date == "24-07-02" & metadata$col==9)] <- 5
# the pattern changes from here:
metadata$REALcol[which(metadata$Plate==1 & metadata$Date == "24-07-02" & metadata$col==11)] <- 7
metadata$REALcol[which(metadata$Plate==2 & metadata$Date == "24-07-02" & metadata$col==1)] <- 8
metadata$REALcol[which(metadata$Plate==2 & metadata$Date == "24-07-02" & metadata$col==3)] <- 9
metadata$REALcol[which(metadata$Plate==2 & metadata$Date == "24-07-02" & metadata$col==5)] <- 10
metadata$REALcol[which(metadata$Plate==2 & metadata$Date == "24-07-02" & metadata$col==7)] <- 12
# Note that plate 2 Well H7 on flow actually comes from H11
metadata$REALcol[which(metadata$Plate==2 & metadata$Date == "24-07-02" & metadata$col==7 & metadata$row=="H")] <- 11
# and for plate 1 column 1, Well H1 on flow actually comes from H6
metadata$REALcol[which(metadata$Plate==1 & metadata$Date == "24-07-02" & metadata$col==1 & metadata$row=="H")] <- 6
# finally, plate 1 column1: Well A1 on flow actually comes from A6. But note the mistake on Day1
metadata$REALcol[which(metadata$Plate==1 & metadata$Date == "24-07-02" & metadata$Day!=1 & metadata$col==1 & metadata$row=="A")] <- 6
# on 2 July Day 1, plate 1 well A1 on flow actually comes from A8
metadata$REALcol[which(metadata$Plate==1 & metadata$Date == "24-07-02" & metadata$Day==1 & metadata$col==1 & metadata$row=="A")] <- 8

# now we are finished with the NON-INNOC annotations
# we can put together the row and REALcol columns to get the location on the OD plate
data_meta <- metadata %>% filter(Plate != "Innoc") %>% unite("OD_well", c(row, REALcol), sep="") %>% select(-col) %>%
                    unite("uniqID", c(Date, Incubator, OD_well), sep=" ", remove = FALSE)

# last (and perhaps least), annotate the additional blank wells from 2 July,
july2_blanks <- data_meta %>% filter(Date=="24-07-02", CommRich==0) %>%
                    select(-FLOWplateWell, -Plate, -uniqID, -OD_well) %>% distinct()
missing_blanks <- data.frame(OD_well=c("A1", "H1", "H12", paste0(LETTERS[1:7],11), paste0(LETTERS[2:7],6)),
                             FLOWplateWell=NA, Plate=NA) %>%
                    mutate(uniqID=paste("24-07-02 Epoch", OD_well), .keep="all")
july2_missing <- merge(july2_blanks,missing_blanks)
data_meta <- rbind(data_meta, july2_missing)
rm(july2_blanks, missing_blanks, july2_missing)

#####
# Innoc data: add OD_well and uniqID columns
#####
# In order to annotate the most raw version of the data, I decided to create redundant rows for the Innoc data. This way each row from Innoc appears 5x with its associated OD_well and uniqID.
innoc_meta <- metadata %>% filter(Plate == "Innoc") %>% select(-row, -col, -REALcol, -Incubator, -Heat)
innoc_meta <- suppressWarnings( # we expect left_join to be upset about many-to-many relationship, no need to issue warning.
                    left_join(innoc_meta,
                              data_meta %>%
                                  select(-FLOWplateWell, -Day, -Plate, -Heat_Day, -Recov_Day) %>%
                                      distinct(), # remove the redundant rows from each day
                              by = c("CommRich", "putida", "protegens", "grimontii", "veronii", "Date"))
              )
# trash the now old df to avoid confusion
rm(metadata)

# save the complete metadata to file
write.csv(rbind(data_meta, innoc_meta), file="./intermediate_data/annotation_for_alldata.csv", quote=FALSE, row.names=FALSE)

#####
# Save the fully annotated raw flow cytometry counts data
#####
# associate the metadata back with the raw counts data
# for Days > 0:
temp_metadata <- data_meta %>% separate_wider_regex(FLOWplateWell, c(FLOW="plate\\w+-", Well="\\w+"))
annot.days <- inner_join(temp_metadata, annotated.rawdata,
                         by=c("Well", "putida", "protegens", "grimontii", "veronii", "CommRich", "Date",
                              "Day", "Incubator", "Plate", "Heat", "Heat_Day", "Recov_Day")) %>%
                  unite("FLOWplateWell", c(FLOW, Well), sep="")
rm(temp_metadata, data_meta)

# for Innoc Days:
temp_metainnoc <- innoc_meta %>% separate_wider_regex(FLOWplateWell, c(FLOW="plate\\w+-", Well="\\w+"))
annot.innoc <- left_join(temp_metainnoc,
                         annotated.rawdata %>% select(-Incubator, -Heat),
                         by=c("Well", "putida", "protegens", "grimontii", "veronii", "CommRich", "Date",
                              "Day", "Plate", "Heat_Day", "Recov_Day")) %>%
                  unite("FLOWplateWell", c(FLOW, Well), sep="")

# save this to file as well
write.csv(rbind(annot.days, annot.innoc), file="./intermediate_data/flow_rawdata.csv", quote=FALSE, row.names=FALSE)

# remove annotated.rawdata as it has been superseded by annot.days and annot.innoc
rm(annotated.rawdata, temp_metainnoc, innoc_meta)

########
# fix annotation mistake for flow cytometry acquisition of uniqID "24-07-02 Epoch A6"
# this well is missing on Day 1 bc A8 was pipetted there instead. But now we have 2 wells for "24-07-02 Epoch A8"...
########
wrong_row <- which(annot.days$uniqID == "24-07-02 Epoch A8" & annot.days$Day == 1 & is.na(annot.days$Volume))
annot.days$uniqID[wrong_row] <- "24-07-02 Epoch A6"
annot.days$OD_well[wrong_row] <- "A6"
annot.days$putida[wrong_row] <- 0
annot.days$protegens[wrong_row] <- 1
annot.days$grimontii[wrong_row] <- 1
annot.days$veronii[wrong_row] <- 0
annot.days$CommRich[wrong_row] <- 2
rm(wrong_row)

The data from Day 0 (annot.innoc) is 3x measurements of the innoculum that is used to inoculate the 5 replicates. For the summary annotation file (aka metadata above), each FLOWplateWell appears redundantly with the up to 5 associated uniqID and OD_well replicates. I chose to do this so that the raw values from the flow cytometry data are preserved with their relevant annotation.

Below, this Day 0 data is averaged across the 3 different FLOWplateWell values. This mean that below Day 0 is now joined with the rest of the data in the variable annotated.rawdata.

# average the Day0 data actually across its redundant flow cytometery measurements...
mean.innoc <- annot.innoc %>% group_by(uniqID, OD_well, Incubator, Plate, Heat, Date,
                                 Day, Heat_Day, Recov_Day,
                                 CommRich, putida, protegens, grimontii, veronii) %>%
          summarise(Mean_putida = mean(Count_putida),
                    Mean_protegens = mean(Count_protegens),
                    Mean_grimontii = mean(Count_grimontii),
                    Mean_veronii = mean(Count_veronii),
                    SD_putida = sd(Count_putida),
                    SD_protegens = sd(Count_protegens),
                    SD_grimontii = sd(Count_grimontii),
                    SD_veronii = sd(Count_veronii),
                    Vol_mean = mean(Volume),
                    vol_sd = sd(Volume))
## `summarise()` has grouped output by 'uniqID', 'OD_well', 'Incubator', 'Plate',
## 'Heat', 'Date', 'Day', 'Heat_Day', 'Recov_Day', 'CommRich', 'putida',
## 'protegens', 'grimontii'. You can override using the `.groups` argument.
# here's some plots to summarize how much variation there is between measurements of the same inocula
plotting_mean.innoc <- mean.innoc %>% pivot_longer(cols = Mean_putida:SD_veronii,
                                                   names_to = c(".value", "species"),
                                                   names_sep = "_") %>%
                          filter(Incubator != "H1") # the same innoculum was used for 2 treatments on 24-07-08. Remove this redundancy for plotting
ggplot(plotting_mean.innoc,
       aes(x=Mean, y=SD, colour=species)) +
  geom_point(alpha=0.7) +
  scale_colour_manual(values=species_4pal_alphabetical) +
  labs(title="3 measures of innoculum")

ggplot(plotting_mean.innoc,
       aes(x=Mean, y=SD, colour=Date)) +
  geom_point(alpha=0.7) +
  labs(title="3 measures of innoculum")

ggplot(plotting_mean.innoc %>% mutate(CV = SD/Mean),
       aes(x=Mean, y=CV, colour=species)) +
  geom_point(alpha=0.7) +
  scale_colour_manual(values=species_4pal_alphabetical) +
  labs(title="3 measures of innoculum")
## Warning: Removed 159 rows containing missing values or values outside the scale range
## (`geom_point()`).

# due to false positive counts,
# the CV blows up when I am counting species that are not actually in that sample

ggplot(plotting_mean.innoc %>% filter((putida == 1 & species == "putida") |
                                      (protegens == 1 & species == "protegens") |
                                      (grimontii == 1 & species == "grimontii") |
                                      (veronii == 1 & species == "veronii")) %>%
            mutate(CV = SD/Mean),
       aes(x=Mean, y=CV, colour=species)) +
  geom_point(alpha=0.7) +
  scale_colour_manual(values=species_4pal_alphabetical) +
  labs(title="3 measures of innoculum (remove absent sp)")

ggplot(plotting_mean.innoc %>% unite("community", putida:veronii),
       aes(x=community, y=Mean, colour=species)) +
  geom_point(alpha=0.7) +
  geom_errorbar(aes(ymin=Mean-SD, ymax=Mean+SD), width=.2) +
  scale_colour_manual(values=species_4pal_alphabetical) +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
  labs(title="3 measures of innoculum")

ggplot(plotting_mean.innoc %>% filter((putida == 1 & species == "putida") |
                                      (protegens == 1 & species == "protegens") |
                                      (grimontii == 1 & species == "grimontii") |
                                      (veronii == 1 & species == "veronii")) %>%
            unite("community", putida:veronii),
       aes(x=community, y=Mean, colour=species)) +
  geom_point(alpha=0.7) +
  geom_errorbar(aes(ymin=Mean-SD, ymax=Mean+SD), width=.2) +
  scale_colour_manual(values=species_4pal_alphabetical) +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
  labs(title="3 measures of innoculum (remove absent sp)")

ggplot(plotting_mean.innoc %>% ungroup() %>%
              select(-uniqID, -OD_well) %>% distinct() %>%
                              filter((putida == 1 & species == "putida") |
                                      (protegens == 1 & species == "protegens") |
                                      (grimontii == 1 & species == "grimontii") |
                                      (veronii == 1 & species == "veronii")) %>%
            unite("community", putida:veronii) %>% mutate(CV = SD/Mean),
       aes(x=community, y=CV, colour=species)) +
  geom_jitter(width=0.2, alpha=0.7) +
  scale_colour_manual(values=species_4pal_alphabetical) +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
  labs(title="3 measures of innoculum (remove absent sp)")

# I have no idea what I would do with this info about sample volume, but here it is:
  # There's no values for 24-07-02 because I accidentally forgot to save the .apx file (this is before I realized that volume is not saved in the .acs files)
ggplot(plotting_mean.innoc,
       aes(x=Vol_mean, y=vol_sd, colour=Date)) +
  geom_point(alpha=0.7) +
  labs(title="3 measures of innoculum")
## Warning: Removed 300 rows containing missing values or values outside the scale range
## (`geom_point()`).

# finally, we can add the mean counts for the Innoc to the whole data
annotated.rawdata <- mean.innoc %>% select(-SD_putida, -SD_protegens, -SD_grimontii, -SD_veronii, -vol_sd) %>%
                      rename(Count_putida=Mean_putida, Count_protegens=Mean_protegens, Count_grimontii=Mean_grimontii, Count_veronii=Mean_veronii, Volume=Vol_mean)
annotated.rawdata <- rbind(annotated.rawdata, annot.days)

# cleanup
rm(annot.days, annot.innoc, plotting_mean.innoc)

Misclassification rate: estimated from innocula (Day 0)

I define the misclassification rate as \(\frac{\text{false positive events}}{\text{total events across all fluorescences}}\). In other words, I am counting the number of events in the gate(s) where I know there should be zero then dividing by the total number of fluorescent events in that well. To estimate the misclassification rate, I use the data from Day 0.

# use Day0 innoculum measurements for a first pass at estimating the misclassification rate
# i.e., the rate of falsely classifying as species A when I know for certain that species A is not present in my sample
misclass.innoc <- mean.innoc %>% mutate(Total_counts = Mean_putida + Mean_protegens + Mean_grimontii + Mean_veronii) %>% # get total for each sample
                    ungroup() %>% select(-uniqID, -OD_well) %>% distinct() %>%   # remove any redundant data
                        # put each species count in its own row in the column called mean (instead of having a column for each species)
                          pivot_longer(cols = Mean_putida:SD_veronii,
                                                   names_to = c(".value", "species"),
                                                   names_sep = "_") %>%
                            filter(Incubator != "H1") %>% # remove the redundant data
                              # keep just the instances where we know for sure that this species was NOT present
                                filter((putida == 0 & species == "putida") |
                                      (protegens == 0 & species == "protegens") |
                                      (grimontii == 0 & species == "grimontii") |
                                      (veronii == 0 & species == "veronii")) %>%
                                  # misclassification rate is the number of events / total counts
                                   mutate(mean_rate = Mean/Total_counts,
                                          sd_rate = SD/Total_counts)

# re-order the species from fast to slow for better plotting
misclass.innoc$species <- factor(misclass.innoc$species,
                                 levels = c("putida", "protegens", "grimontii", "veronii"))

ggplot(misclass.innoc, aes(x=species, y=mean_rate, colour=species)) +
  geom_beeswarm(alpha=0.5) +
  geom_errorbar(aes(ymin=mean_rate-sd_rate, ymax=mean_rate+sd_rate), width=.05, alpha=0.2) +
  scale_colour_manual(values=species_4pal_speed) +
  labs(title="misclassification rate in innoculum", y="mean +/- SD")

max(misclass.innoc$mean_rate)
## [1] 0.008537981
ggplot(misclass.innoc %>% unite("community", putida:veronii),
       aes(x=species, y=mean_rate, colour=species)) +
  facet_wrap(vars(community)) +
  geom_point(alpha=0.5) +
  scale_y_continuous(breaks = c(0, 0.005, 0.01)) +
  scale_colour_manual(values=species_4pal_speed) +
  theme(axis.text.x = element_text(angle = 90)) +
  labs(title="misclassification rate in innoculum")

# summarize the mean and max misclassification rates observed for each species
misclass.innoc %>% group_by(species) %>% summarise(mean_misclass = mean(mean_rate),
                                                   max_misclass = max(mean_rate))
# clean-up
rm(misclass.innoc, mean.innoc)

From here we can clearly see that the misclassification rate can be as bad as 1% and that it depends on the species. Protegens is the most likely to be misclassified and, from the plot of all possible community combinations, we see that the problem seems to be that putida cells are being misclassified as belonging to protegens.

But I know that this rate of misclassification also depends on environmental conditions. So I don’t think it makes sense to correct the data using the exact values given above. The more cautious approach would be to treat with caution any counts that are less than 1%.

Data Processing

Here we make decisions about which data to keep and which to toss.

Minimum Number of Events

I need to set a threshold for the minimum number of fluorescent events observed in a well in order for me to decide that the well is not trustworthy.

At some point I did sample some wells that are true negatives. From this we learn that a true negative can have as many as 20 total events.

Remember that I set the stopping conditions for 10 000 events in the cell gate OR until it reaches the end of the sample (which seems to be 146uL). Let’s rather arbitrarily set the minimum total events in the well at 51 and see what happens with that.

annotated.rawdata <- annotated.rawdata %>% mutate(Total_counts = Count_putida + Count_protegens + Count_grimontii + Count_veronii) %>%
                                           mutate(Total_density = Total_counts/Volume)

# plot the counts and volume for true negative wells
ggplot(annotated.rawdata %>% filter(CommRich==0, !is.na(Total_counts)),
       aes(x=Total_counts, y=Volume)) +
  geom_point() +
  labs(title="True negatives")

# plot the total counts as a histogram just to see what the dispersal is like
ggplot(annotated.rawdata, aes(x=Total_counts)) +
  geom_histogram(colour="black", fill="white") +
  labs(title="everything")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 387 rows containing non-finite outside the scale range
## (`stat_bin()`).

ggplot(annotated.rawdata, aes(x=Total_counts)) +
  geom_histogram(colour="black", fill="white") +
  scale_x_log10() +
  labs(x="Total_counts in LOG SCALE!", title="everything")
## Warning in scale_x_log10(): log-10 transformation introduced infinite values.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 396 rows containing non-finite outside the scale range
## (`stat_bin()`).

ggplot(annotated.rawdata, aes(x=Total_counts)) +
  geom_histogram(colour="black", fill="white") +
  scale_x_continuous(limits = c(-10,1010)) +
  labs(title="everything")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2014 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_bar()`).

# then plot the total counts against the volume because we expect these very low counts should be associated with the highest volumes
ggplot(annotated.rawdata, aes(x=Total_counts, y=Volume, colour=as.factor(Heat_Day))) +
  geom_point(alpha=0.5) +
  scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
  labs(colour="Day of heat") +
  labs(title="everything")
## Warning: Removed 1652 rows containing missing values or values outside the scale range
## (`geom_point()`).

# okay, let's just see a histogram of the total cell density
ggplot(annotated.rawdata, aes(x=Total_density)) +
  geom_histogram(colour="black", fill="white") +
  scale_x_log10() +
  labs(x="Total_density in LOG SCALE!") +
  labs(title="everything")
## Warning in scale_x_log10(): log-10 transformation introduced infinite values.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 471 rows containing non-finite outside the scale range
## (`stat_bin()`).

ggplot(annotated.rawdata %>% filter(!is.na(Heat_Day)),
       aes(x=Total_density)) +
  facet_grid(rows = vars(Heat_Day)) +
  geom_histogram(colour="black", fill="white") +
  scale_x_log10() +
  labs(x="Total_density in LOG SCALE!", title="Day of Heat (everything)")
## Warning in scale_x_log10(): log-10 transformation introduced infinite values.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 161 rows containing non-finite outside the scale range
## (`stat_bin()`).

ggplot(annotated.rawdata %>% filter(!is.na(Recov_Day)),
       aes(x=Total_density)) +
  facet_grid(rows = vars(Recov_Day)) +
  geom_histogram(colour="black", fill="white") +
  scale_x_log10() +
  labs(x="Total_density in LOG SCALE!", title="Day of Recovery (everything)")
## Warning in scale_x_log10(): log-10 transformation introduced infinite values.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 133 rows containing non-finite outside the scale range
## (`stat_bin()`).

### check what these graphs look like when I exclude wells where Total_counts < 51
ggplot(annotated.rawdata %>% filter(Total_counts > 50),
       aes(x=Total_counts, y=Volume, colour=as.factor(Heat_Day))) +
  geom_point(alpha=0.5) +
  scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
  labs(colour="Day of heat") +
  labs(title="Total_counts > 50")
## Warning: Removed 1218 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(annotated.rawdata %>% filter(Total_counts > 50),
       aes(x=Total_density)) +
  geom_histogram(colour="black", fill="white") +
  scale_x_log10() +
  labs(x="Total_density in LOG SCALE!") +
  labs(title="Total_counts > 50")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 75 rows containing non-finite outside the scale range
## (`stat_bin()`).

ggplot(annotated.rawdata %>% filter(Total_counts > 50, !is.na(Heat_Day)),
       aes(x=Total_density)) +
  facet_grid(rows = vars(Heat_Day)) +
  geom_histogram(colour="black", fill="white") +
  scale_x_log10() +
  labs(x="Total_density in LOG SCALE!", title="Day of Heat (Total_counts > 50)")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(annotated.rawdata %>% filter(Total_counts > 50, !is.na(Recov_Day)),
       aes(x=Total_density)) +
  facet_grid(rows = vars(Recov_Day)) +
  geom_histogram(colour="black", fill="white") +
  scale_x_log10() +
  labs(x="Total_density in LOG SCALE!", title="Day of Recovery (Total_counts > 50)")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#######
# set threshold of > 50 events in total
#######
# copy everything EXCEPT BLANK WELLS to new variable
the.data <- annotated.rawdata %>% filter(CommRich != 0)

# summarize some information about the data points that I'm about to exclude
the.data %>% filter(Total_counts < 51) %>% ungroup() %>% select(uniqID, Heat, Day, Heat_Day, Recov_Day, CommRich, Volume, Total_counts) %>% summary()
##     uniqID               Heat            Day           Heat_Day   
##  Length:87          Min.   : 0.00   Min.   :2.000   Min.   :2.00  
##  Class :character   1st Qu.:48.00   1st Qu.:3.000   1st Qu.:2.75  
##  Mode  :character   Median :48.00   Median :4.000   Median :3.00  
##                     Mean   :41.66   Mean   :3.621   Mean   :2.75  
##                     3rd Qu.:48.00   3rd Qu.:4.000   3rd Qu.:3.00  
##                     Max.   :48.00   Max.   :5.000   Max.   :3.00  
##                                                     NA's   :47    
##    Recov_Day        CommRich         Volume       Total_counts  
##  Min.   :1.000   Min.   :1.000   Min.   :145.0   Min.   : 0.00  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:145.0   1st Qu.: 2.50  
##  Median :1.000   Median :2.000   Median :145.0   Median : 7.00  
##  Mean   :1.462   Mean   :1.782   Mean   :145.4   Mean   :10.72  
##  3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:146.0   3rd Qu.:15.00  
##  Max.   :2.000   Max.   :4.000   Max.   :146.0   Max.   :50.00  
##  NA's   :48
# exclude from analysis all non-blanks rows where Total_counts < 51
the.data$CommRich[which(the.data$Total_counts < 51)] <- NA

# put the blank data back with the whole dataset
the.data <- rbind(the.data,
                  annotated.rawdata %>% filter(CommRich == 0))

# replace the data with NA values for all rows where Total_counts < 51
    # this includes both the excluded unreliable data as well as the true blanks flow data
the.data$Count_putida[which(the.data$Total_counts < 51)] <- NA
the.data$Count_protegens[which(the.data$Total_counts < 51)] <- NA
the.data$Count_grimontii[which(the.data$Total_counts < 51)] <- NA
the.data$Count_veronii[which(the.data$Total_counts < 51)] <- NA
the.data$Total_density[which(the.data$Total_counts < 51)] <- NA
the.data$Total_counts[which(the.data$Total_counts < 51)] <- NA

# clean-up
rm(annotated.rawdata)

I have re-assigned all wells that had less than 51 total fluorescent events as NA values. This was a total of 87 wells.

Note that I’ve also removed any flow cytometry data from the true negative wells. This was 17 wells.

# calculate densities and relative abundances for each species
the.data <- the.data %>% mutate(Conc_putida = Count_putida/Volume,
                                Conc_protegens = Count_protegens/Volume,
                                Conc_grimontii = Count_grimontii/Volume,
                                Conc_veronii = Count_veronii/Volume,
                                relDen_putida = Count_putida/Total_counts,
                                relDen_protegens = Count_protegens/Total_counts,
                                relDen_grimontii = Count_grimontii/Total_counts,
                                relDen_veronii = Count_veronii/Total_counts) #%>%
              #select(-Total_counts)

# sanity check that the relative densities are always adding up to 1
check <- the.data %>% mutate(sum_relDen = relDen_putida + relDen_protegens + relDen_grimontii + relDen_veronii) %>%
            # for convenience, remove the 87 NA values
            drop_na(Total_counts)
all.equal(check$sum_relDen, rep(1, nrow(check))) %>% # use all.equal() as there seem values very close to 1 but not exactly equal to 1
  stopifnot()

rm(check)

I have calculated the relative densities and made sure that all relative densities add up to 1.

Plot preliminary time-series

Before diving deeper into the data, let’s just see quickly what the time series look like:

# check: is each replicated time series annotated appropriately so that it can be pieced together?
the.data <- the.data %>% unite("community", putida:veronii, remove=FALSE) %>% ungroup()
for(com in unique(the.data$community)) {
  plot( ggplot(the.data %>% filter(community==com) %>%
           select(uniqID, Heat, Day, relDen_putida, relDen_protegens, relDen_grimontii, relDen_veronii) %>%
              pivot_longer(cols=starts_with("relDen"), names_to="species", names_prefix="relDen_", values_to="relDensity"),
         aes(x=Day, y=relDensity, colour=species, group=paste(uniqID,Heat,species))) +
    facet_grid(~Heat) +
    geom_point(alpha=0.2) +
    geom_line(alpha=0.5) +
    scale_colour_manual(values=species_4pal_alphabetical) +
    labs(title=com))
}
## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 28 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 92 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 48 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 40 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 36 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 28 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 40 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 32 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 68 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 64 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 32 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 1516 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1516 rows containing missing values or values outside the scale range
## (`geom_line()`).

# clean up
rm(com)

After staring at the above time series for long enough, two things become clear

  1. Protegens has contaminated several wells. This is unambiguous contamination when it is present in communities where it was not innoculated. For these contaminated replicates, the entire time-series will be excluded from the analysis.

  2. The misclassification rate varies over time: e.g., putida is misclassified in a protegens monoculture (0_1_0_0) on day 1 for 3 different heat treatments. It appears in that well with a density of > 10% ! As well, protegens and veronii are misclassified in putida monoculture (1_0_0_0) on days 2 and 3 of 24 hrs heat.

Identify wells driven to extinction

Of the 87 NA values identified above,

  • Some occurred as a result of flow cytometry issues. E.G., there was probably a bubble that I didn’t notice. (When I noticed the bubble, I would re-run that well. But this is only after I began to understand that this was happening. So some wells were unfortunately lost because of this error.)

    IN THIS CASE: this is a true NA value. It only happens at one time point (which may or may not be a heat day). And there is data for this well during the recovery period.

  • Some occurred as a result of prolonged heat exposure that dropped the total density in that well below the threshold of detection. This only happened on day 3 of heat for the longest heat treatment. There is data for this well during the recovery period.

    IN THIS CASE: this is a true NA value.

  • Others occurred as a result of prolonged heat exposure that drove the well to complete extinction. There is no flow cytometery data for this well during the recovery period because it went extinct. Extinction needs to be confirmed against the OD data.

    IN THIS CASE: this is a true NA value during the heat treatment but it should become a 0 value during the recovery period.

The OD data is analyzed in the file called main_expt--OD_analysis.Rmd. This script outputs a csv file indicating the extinct wells, which I will use below.

# import extinct well data from file
extinct <- read.csv("./intermediate_data/extinctOD_wells.csv")

# we know that there was no detectable growth on Recovery days. So replace the current values with true 0's here.
the.data[which(the.data$uniqID %in% extinct$uniqID & the.data$Recov_Day>0),] <- the.data[which(the.data$uniqID %in% extinct$uniqID & the.data$Recov_Day>0),] %>%
          mutate(Total_density=0, Conc_putida=0, Conc_protegens=0, Conc_grimontii=0, Conc_veronii=0,
                 relDen_putida=0, relDen_protegens=0, relDen_grimontii=0, relDen_veronii=0,
                 CommRich=putida+protegens+grimontii+veronii)
# during the heat days, we know that there was no OD-detectable growth for (extinct$Day + 1).
# This means any flow cytometry data we have is unreliable and should be replaced with NA.
  # wells where Day 2 is unreliable
tmp <- extinct %>% filter(Day == 1)
the.data[which(the.data$uniqID %in% tmp$uniqID & the.data$Day==2),] <- the.data[which(the.data$uniqID %in% tmp$uniqID & the.data$Day==2),] %>%
          mutate(Total_density=NA, Conc_putida=NA, Conc_protegens=NA, Conc_grimontii=NA, Conc_veronii=NA,
                 relDen_putida=NA, relDen_protegens=NA, relDen_grimontii=NA, relDen_veronii=NA,
                 CommRich=NA)
rm(tmp)
  # wells where Day 3 is unreliable (and Day 3 is a heat day!)
extinct <- extinct[-which(extinct$uniqID %in% c("24-08-19 Epoch B4", "24-08-19 Epoch D2")),]
the.data[which(the.data$uniqID %in% extinct$uniqID & the.data$Day==3),] <- the.data[which(the.data$uniqID %in% extinct$uniqID & the.data$Day==3),] %>%
          mutate(Total_density=NA, Conc_putida=NA, Conc_protegens=NA, Conc_grimontii=NA, Conc_veronii=NA,
                 relDen_putida=NA, relDen_protegens=NA, relDen_grimontii=NA, relDen_veronii=NA,
                 CommRich=NA)
rm(extinct)

Distinguish between contamination & misclassification

To address both the problem of contamination & the problem of the misclassification rate varying over time, I had to closely re-examine the flow cytometry raw data (which I did by eye, insert crying emoji).

From the Day 0 data, I hypothesized that the misclassification rate is ~1%. So let’s pull up the identity of all the flow cytometry data where >1% of the relative density is attributed to a species that was not inoculated in that well (i.e., and therefore it should not be there). I then manually examined the flow cytometry raw data files in FCS Express for all the wells listed below:

# identify contamination at 1%
contamin.df <- the.data %>%  filter((putida == 0 & relDen_putida > 0.01) |
                                    (protegens == 0 & relDen_protegens > 0.01) |
                                    (grimontii == 0 & relDen_grimontii > 0.01) |
                                    (veronii == 0 & relDen_veronii > 0.01))

contamin.df %>% filter(Date %in% c("24-08-05", "24-08-19")) %>% select(Date, FLOWplateWell, Day, community,
                                                                       relDen_putida, relDen_protegens, relDen_grimontii, relDen_veronii)

The gating for all the wells listed above (and more) was double-checked by eye in FCS Express and new cell counts were outputted. It seemed to me that there is a correlation between the heat environment, or at least the day of the serial transfer, and how clean or messy the gating looked. In particular it seemed to me that it was more difficult to classify species during heat days.

We know from the OD data (see main_expt--OD_analysis) that the 24h of heat treatment had no instance of contamination detected (i.e., at least for the blank wells). Since I suspect that the misclassification rate changes with the heat day, let’s assume that the 24h heat treatment does not contain any contamination events then look at the occurence of species that were never inoculated in those wells as an estimate of the misclassification rate on different days of the serial transfer.

(In the future: it should also be possible to get a covariance matrix to estimate which species are being misclassified as which.)

misclass24 <- the.data %>% filter(Day > 0, Heat == 24) %>%
                filter((putida == 0 & relDen_putida > 0) |
                       (protegens == 0 & relDen_protegens > 0) |
                       (grimontii == 0 & relDen_grimontii > 0) |
                       (veronii == 0 & relDen_veronii > 0))
# separate the correctly called species from the species that are absent
misclass24_REAL <- misclass24 %>% mutate(relDen_putida = putida * relDen_putida,
                                         relDen_protegens = protegens * relDen_protegens,
                                         relDen_grimontii = grimontii * relDen_grimontii,
                                         relDen_veronii = veronii * relDen_veronii)
misclass24 <- misclass24 %>% mutate(relDen_putida = abs(putida-1) * relDen_putida,
                                    relDen_protegens = abs(protegens-1) * relDen_protegens,
                                    relDen_grimontii = abs(grimontii-1) * relDen_grimontii,
                                    relDen_veronii = abs(veronii-1) * relDen_veronii)
# pivot longer so there's a column for species
misclass24_REAL <- misclass24_REAL %>% pivot_longer(cols = relDen_putida:relDen_veronii,
                                                    values_to = "relDen",
                                                    names_to = "species",
                                                    names_prefix = "relDen_") %>%
                    select(uniqID, Day, community, putida, protegens, grimontii, veronii,
                           Total_density, relDen, species)
misclass24 <- misclass24 %>% pivot_longer(cols = relDen_putida:relDen_veronii,
                                                    values_to = "relDen",
                                                    names_to = "species",
                                                    names_prefix = "relDen_") %>%
                select(uniqID, Day, community, putida, protegens, grimontii, veronii,
                       Total_density, relDen, species)
# remove the true species from the misclass data because these are now fake 0's
misclass24 <- misclass24[-which(misclass24$putida == 1 & misclass24$species == "putida"),]
misclass24 <- misclass24[-which(misclass24$protegens == 1 & misclass24$species == "protegens"),]
misclass24 <- misclass24[-which(misclass24$grimontii == 1 & misclass24$species == "grimontii"),]
misclass24 <- misclass24[-which(misclass24$veronii == 1 & misclass24$species == "veronii"),]
# remove the single contaminated sample
misclass24 <- misclass24[-which(misclass24$protegens == 0 & misclass24$species == "protegens" & misclass24$relDen > 0.75),]

ggplot(misclass24,
       aes(x=species, y=relDen, colour=species)) +
  facet_wrap(vars(Day)) +
  geom_beeswarm(alpha=0.5) +
  scale_colour_manual(values=species_4pal_alphabetical) +
  theme(axis.text.x = element_text(angle = 90)) +
  labs(y="relative density of misclassified",
       title="misclassification in 24h heat for different days")

# clean up
rm(misclass24, misclass24_REAL)

Recall that for 24h duration, Day 1 of serial transfer had 6h of extreme heat at the end, Day 2 was all extreme heat then returned to the “ambient” warm temperature only in the last few hours, and Days 3 & 4 were the recovery days with constant “ambient” warm temperature.

What we see from the plot above is that the misclassification rate can get as high as 20% (and that it does depend on the day but it seems that the first day of recovery is actually worse than the heat days themselves), although most replicates & days seem to be well behaved.

Therefore let’s set 25% as the threshold for contamination. This means that any replicates that show >25% relative density for a species that was not inoculated there are defined as contaminated. All time-points from these contaminated replicates are completely removed from the downstream analysis.

rm(contamin.df) # remove anything we may have had above.

# for now let's define contamination as >25% for something that should not be there.
contamin.df <- the.data %>%  filter((putida == 0 & relDen_putida > 0.25) |
                                    (protegens == 0 & relDen_protegens > 0.25) |
                                    (grimontii == 0 & relDen_grimontii > 0.25) |
                                    (veronii == 0 & relDen_veronii > 0.25))
tmp <- the.data[-which(the.data$uniqID %in% unique(contamin.df$uniqID)),]

###############
# output absolute density data for analysis
###############

# Day 0 would need to be the pre-dilution absolute densities
ggplot(tmp %>% filter(Day==0) %>% select(-uniqID, -OD_well, -Heat, -Incubator) %>% distinct(),
       aes(x=Date, y=Total_density)) +
  geom_beeswarm() +
  labs(y="")
## Warning: Removed 15 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(tmp %>% filter(Day==0) %>% select(-uniqID, -OD_well, -Heat, -Incubator) %>% distinct(),
       aes(x=Date, y=Volume)) +
  geom_beeswarm() +
  labs(y="")
## Warning: Removed 15 rows containing missing values or values outside the scale range
## (`geom_point()`).

# I lost the data on flow volume for Date 24-07-02.
# But we see from the plot that there's not *that* much variation between batches.

## IMPORTANT NOTE: the Day 0 data is not featured in any of the downstream analyses for this manuscript. Therefore this interpolation doesn't actually matter.

# let's interpolate the well volumes on Day 0 of 24-07-02 by using the median well volumes for all other dates on Day 0
tmp.Day0 <- tmp[which(tmp$Day==0 & tmp$Incubator=="Epoch"),] %>% select(-uniqID, -OD_well) %>% distinct()
  # get the median volume for Day 0
medianVol <- median(tmp.Day0$Volume, na.rm=TRUE)
  # apply the median volume to Day 0 values from 24-07-02
tmp.Day0 <- tmp.Day0 %>% filter(Date=="24-07-02")
tmp.Day0$Volume <- medianVol
  # recalculate the absolute densities for Day0
tmp.Day0 <- tmp.Day0 %>% mutate(Total_density = Total_counts/Volume,
                                Conc_putida = Count_putida/Volume,
                                Conc_protegens = Count_protegens/Volume,
                                Conc_grimontii = Count_grimontii/Volume,
                                Conc_veronii = Count_veronii/Volume)
# finally, join the estimated absolute densities for Day0 back in with the whole data
tmp.Day0.0702 <- left_join(tmp %>% filter(Day==0, Date=="24-07-02") %>% select(-Volume, -Total_density, -Conc_putida, -Conc_protegens, -Conc_grimontii, -Conc_veronii),
                           tmp.Day0)
## Joining with `by = join_by(Incubator, Plate, Heat, Date, Day, Heat_Day,
## Recov_Day, CommRich, community, putida, protegens, grimontii, veronii,
## Count_putida, Count_protegens, Count_grimontii, Count_veronii, FLOWplateWell,
## Total_counts, relDen_putida, relDen_protegens, relDen_grimontii,
## relDen_veronii)`
tmp <- rbind(tmp %>% filter(Date != "24-07-02"),
             tmp %>% filter(Date == "24-07-02") %>% filter(Day > 0),
             tmp.Day0.0702)

rm(medianVol, tmp.Day0, tmp.Day0.0702)

# finally, remove known miscalled estimates from the data
tmp <- tmp %>% mutate(Conc_putida = putida * Conc_putida,
                      Conc_protegens = protegens * Conc_protegens,
                      Conc_grimontii = grimontii * Conc_grimontii,
                      Conc_veronii = veronii * Conc_veronii) %>%
        mutate(Total_density = Conc_putida + Conc_protegens + Conc_grimontii + Conc_veronii)

# output this data to file
absDensity <- tmp %>% filter(community != "0_0_0_0") %>%
                select(uniqID, Heat, Day, Heat_Day, Recov_Day, CommRich:veronii, Total_density:Conc_veronii)
save(absDensity, file="./intermediate_data/absolute_density_data.RData")

rm(tmp, com, contamin.df)
## Warning in rm(tmp, com, contamin.df): object 'com' not found

Plot final time-series

for(com in unique(absDensity$community)) {
  plot(ggplot(absDensity %>% filter(community==com) %>%
           select(uniqID, Heat, Day, Conc_putida, Conc_protegens, Conc_grimontii, Conc_veronii) %>%
              pivot_longer(cols=starts_with("Conc"), names_to="species", names_prefix="Conc_", values_to="absDensity"),
         aes(x=Day, y=absDensity, colour=species, group=paste(uniqID,Heat,species))) +
    facet_grid(~Heat) +
    geom_point(alpha=0.2) +
    geom_line(alpha=0.5) +
    scale_colour_manual(values=species_4pal_alphabetical) +
    labs(title=com))
}
## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 48 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 36 rows containing missing values or values outside the scale range
## (`geom_point()`).

# in the analyses below we will be interested in shannon diversity so let's already make a column for that
absDensity$Diversity <- diversity(absDensity[,c("Conc_putida", "Conc_protegens", "Conc_grimontii", "Conc_veronii")],
                                  index = "shannon")

# first let's remove the empty wells as we won't need them anymore
absDensity <- absDensity %>% filter(community != "0_0_0_0")
# Note that there are many 0 and NA values for Total_density
summary(absDensity$Total_density)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   47.03  100.25  219.98  220.90 2674.00      72
# 0's are communities that went extinct altogether and never recovered
absDensity[which(absDensity$Total_density == 0),]
# most NA's are communities below the threshold of detection during heat that later perhaps recovered
absDensity[which(is.na(absDensity$Total_density) & absDensity$Heat>12),]
# other NA's are just missing data (e.g., due to flow cytometry clogs or just plain pipetting mistakes)
absDensity[which(is.na(absDensity$Total_density) & absDensity$Heat<12),]
# the total density data will have to be slightly adjusted for fitting to the models
absDen_forFit <- absDensity %>% filter(Day > 0)
# for the "raw" total density data that will be fitted via negative binomial GLM,
  # keep the 0's in the data
  # but convert NA's into epsilon values (where epsilon is just below the threshold of detection)
below_threshold_rows <- which(is.na(absDen_forFit$Total_density) & absDen_forFit$Heat>12)
absDen_forFit$Total_density[below_threshold_rows] <- (0.25*50/146)
rm(below_threshold_rows)

# re-arrange the levels of Heat so that emmeans can be run:
absDen_forFit$Heat <- as.character(absDen_forFit$Heat)
absDen_forFit$Heat[which(absDen_forFit$Heat == 0)] <- "control"
# !!! emmeans expects the control to be the very *last* level !!!
absDen_forFit$Heat <- factor(absDen_forFit$Heat,
                             levels = c("6", "12", "24", "48", "control"))

# clean up
rm(com)

After plotting, I also created another version of the full data that will be used for fitting the data to models: ``. It excludes the Day 0 data (as this will not be analyzed hereafter).

Below, I will analyze the diversity and the productivity (AKA total density) to understand how they change relative to the no heat control during the resistance and the recovery period. The diversity is easier to deal with because we can use the Shannon diversity calculation as implemented by vegan.

For the total abundances, there are extinction (AKA 0’s) and NA events in this data – the extinction events in particular are important and meaningful parts of our data!! To deal with this issue, I will distinguish between 0’s and NA’s by using a \(x + \epsilon\) transformation, where \(\epsilon\) indicates samples that are below the threshold of detection. I will use \(\epsilon\) as 0.25 * the threshold of detection for the flow cytometer (which is 50 total fluorescent events in \(146\mu L\)).

Load community growth traits expectations

We define “average growth rate of the community” either as the expectation from the inoculated communities (e.g., the quadruplet community has expected growth rate = mean of the 4 species). In other words, this assumes that all species that were inoculated in the community remain at equal ratios and therefore the average growth rate of a community is just the mean of the growth rates of the species that were inoculated there. This is called the community_expected_mu. (Recall that if we thought the communities would stay fixed for the equal species ratios that we inoculated them at, then we should use the geometric mean to calculate the community growth rate. We use the arithmetic mean because we a priori believe that the communities will tend to be dominated by the faster growing species.)

We also define it as the mean of the realized communities by using the species mean relative densities in the no heat control condition. In other words, this takes into account the actual relative densities of the species that can hang out together across serial transfers and uses that as an expectation of the community’s growth rate. This is called the community_averaged_mu.

Another trait that we assayed was whether species are sensitive or resistant to 40C heat (resistance is a binary trait: either TRUE or FALSE). We define communities as expected to be resistant when at least one of the inoculated species is resistant. We define communities as expected to be sensitive when none of the species are resistant.

# load in the stationary phase growth rate estimates from Expt1
load("./intermediate_data/expt1--all_growthcurve_data.RData")
rm(ALL_data.df, derivs.df, TTD.df) # keep just the dataframe with the growth rate estimates (mu)

# a look-up table for growth rates at 30C
growthrates.df <- Dil_growthrates.df %>% filter(Inoculum == "Stationary",
                                                Temp == 30,
                                                Sample %in% c("BSC001", "BSC005", "BSC019", "CK101")) %>%
                        arrange(desc(mu))

# a look-up table for resistance to 40C
resist.df <- Dil_growthrates.df %>% filter(Inoculum == "Stationary",
                                           Temp == 40,
                                           Sample %in% c("BSC001", "BSC005", "BSC019", "CK101")) %>%
                mutate(resistant = ifelse(mu>0, 1, 0)) %>%
                  arrange(match(Sample, c("BSC001", "CK101", "BSC019", "BSC005")))

# calculate the average growth rate for the inoculated communities
absDen_forFit <- absDen_forFit %>% mutate(community_expected_mu = (growthrates.df$mu[1]*putida + growthrates.df$mu[2]*protegens + growthrates.df$mu[3]*grimontii + growthrates.df$mu[4]*veronii)/(putida + protegens + grimontii + veronii))

# calculate the average growth rate for the realized communities
temp <- absDensity %>% filter(Heat == 0) %>% group_by(community) %>%
          mutate(relDen_putida = Conc_putida/Total_density,
                 relDen_protegens = Conc_protegens/Total_density,
                 relDen_grimontii = Conc_grimontii/Total_density,
                 relDen_veronii = Conc_veronii/Total_density) %>%
            summarise(relDen_putida = median(relDen_putida, na.rm = TRUE),
                      relDen_protegens = median(relDen_protegens, na.rm = TRUE),
                      relDen_grimontii = median(relDen_grimontii, na.rm = TRUE),
                      relDen_veronii = median(relDen_veronii, na.rm = TRUE)) %>%
              mutate(community_averaged_mu = growthrates.df$mu[1]*relDen_putida + growthrates.df$mu[2]*relDen_protegens + growthrates.df$mu[3]*relDen_grimontii + growthrates.df$mu[4]*relDen_veronii)

# get the community resistances
print(resist.df %>% select(Species, Sample, Temp, mu, resistant)) # recall that only putida is resistant
##        Species Sample Temp      mu resistant
## 1    P. putida BSC001   40 0.40721         1
## 2 P. protegens  CK101   40 0.00000         0
## 3 P. grimontii BSC019   40 0.00000         0
## 4   P. veronii BSC005   40 0.00000         0
# so it's easy to get community resistance because it's just the presence/absence of putida

# add the information to the full data set
absDen_forFit <- inner_join(absDen_forFit, temp %>% select(community, community_averaged_mu)) %>%
                  mutate(resistant = putida)
## Joining with `by = join_by(community)`
# remember to also add the community growth rates and resistances to the other data frame (this one is used for the extinction analysis because Heat is numeric here)
absDensity <- inner_join(absDensity,
                         absDen_forFit %>%
                           select(community, community_expected_mu, community_averaged_mu, resistant) %>%
                             distinct())
## Joining with `by = join_by(community)`
# clean up
rm(temp, Dil_growthrates.df, resist.df, growthrates.df)

Ordination of communities over time

After backing up a bit and thinking about what the main story of the paper could be, I think the main message that I would like to tell with the paper is that heat duration has a threshold effect. So while shorter and intermediate heat durations have some effect during heat that is different from control, communities return to a similar state after recovery. On the other hand, long duration heat events lead to extinction (i.e., either of the entire community or of vulnerable species within the community) so the communities cannot recover anymore. In other words, there’s a threshold effect where the amount of heat (or bacterial) induced killing has gone on for so long than it passes a critical point and the communities recover to a different state. I don’t want to use the term “tipping point” but for sure the design of our experiment allows us to use phrases like “threshold effect” and “critical transition” sensu stricto (e.g., as explained in Munson et al., 2018).

I think it would be fantastic if I could produce a figure that summarizes the entire data in a way that builds an argument for the quintessential ball-landscape schematic that people keep showing when they talk about ecosystem stability to perturbation (e.g., see schematic in Fig. 2 of Shade et al., 2012 or the empirical figure in Fig. 3 of Jurburg et al., 2017).

Here are some tutorials on ordination: https://eddatascienceees.github.io/tutorial-rayrr13/ https://ourcodingclub.github.io/tutorials/ordination/ https://uw.pressbooks.pub/appliedmultivariatestatistics/chapter/anosim/ https://uw.pressbooks.pub/appliedmultivariatestatistics/chapter/visualizing-and-interpreting-ordinations/

To calculate the Bray-Curtis dissimilarity, we are forced to choose how to deal with NA values (most of which are found in the resistance time points and so it doesn’t really make sense to outright drop them). NA values exist for two reasons:

  1. “true” missing data where the well was not acquired at all due to technical difficulties/mistakes (only a few of this type). I used interpolation to deal with these: use the median value from other replicates at that same community:day:treatment.

  2. below threshold of detection missing data where the total density was too low to reliably estimate the cell counts. I replace the NA with the limit of detection of the cytometer (epsilon = 0.25*50/146, as above) and assume equal frequencies of the species that were inoculated in that community.

Then, if we just follow the example tutorials directly, with columns = four species and rows = different communities on different days for different heat treatments, then the data simply gets split up by species. But that’s not what we want to understand in this case.

We want to understand how the communities are changing over time (and as a function of different heat durations) so let’s give it the data as species x time. This can be achieved by widening the data so that we have abundances of the 4 species during resistance, during early recovery, and during late recovery.

Note that I also had to keep just 3 time points from the control treatment. I chose to keep day 1 (coded as “resistance”), day 3 (coded as “early recovery”), and day 5 (coded as “late recovery”) because this way the ordination plot will show the control treatment early, middle, and late in the time series…

# go back to the complete data that includes NA values for all 4 species on some days
absDen_forOrd <- absDen_forFit %>% select(-Diversity, -community_expected_mu, -community_averaged_mu, -resistant)
# NA values with Total_density == NA are "true" missing data where I failed to record the flow cytometry measurements on that day due to technical difficulties/mistakes. These can be interpolated by using the median values from the remaining community replicates
  ## get the median values for all communities, days, and heat treatments
median_vals <- absDen_forOrd %>% group_by(Heat, Day, community) %>%
                  summarise(Med_putida = median(Conc_putida, na.rm=TRUE),
                            Med_protegens = median(Conc_protegens, na.rm=TRUE),
                            Med_grimontii = median(Conc_grimontii, na.rm=TRUE),
                            Med_veronii = median(Conc_veronii, na.rm=TRUE))
## `summarise()` has grouped output by 'Heat', 'Day'. You can override using the
## `.groups` argument.
  ## get the index for the rows with "true" missing values
missing_rows <- which(is.na(absDen_forOrd$Total_density))
  ## loop through the missing values
for(i in missing_rows){
  # find the interpolation value in the table of median values
  temp_med_val <- median_vals[median_vals$Heat == absDen_forOrd$Heat[i] &
                                median_vals$Day == absDen_forOrd$Day[i] &
                                median_vals$community == absDen_forOrd$community[i],]
  # replace the NA values with the median values
  absDen_forOrd$Conc_putida[i] <- temp_med_val$Med_putida
  absDen_forOrd$Conc_protegens[i] <- temp_med_val$Med_protegens
  absDen_forOrd$Conc_grimontii[i] <- temp_med_val$Med_grimontii
  absDen_forOrd$Conc_veronii[i] <- temp_med_val$Med_veronii
  # clean up
  rm(temp_med_val)
}
# clean up
rm(median_vals, missing_rows, i)

# on the other hand, NA values where Total_density is epsilon represent flow cytometry counts that were below the threshold of detection. In this case let's assume 1:1 ratios of inoculated strains at a total density equal to epsilon.
epsilon <- (0.25*50/146)
  ## get the index for the missing value rows below the threshold of detection
missing_rows <- which(is.na(absDen_forOrd$Conc_putida))
  ## CommRich NA values were supposed to indicate some differences but that doesn't really matter for us anymore
absDen_forOrd$CommRich <- absDen_forOrd$putida + absDen_forOrd$protegens + absDen_forOrd$grimontii + absDen_forOrd$veronii
for(i in missing_rows){
  # replace the NA values with epsilon divided by the inoculated species richness
  absDen_forOrd$Conc_putida[i] <- absDen_forOrd$putida[i] * epsilon / absDen_forOrd$CommRich[i]
  absDen_forOrd$Conc_protegens[i] <- absDen_forOrd$protegens[i] * epsilon / absDen_forOrd$CommRich[i]
  absDen_forOrd$Conc_grimontii[i] <- absDen_forOrd$grimontii[i] * epsilon / absDen_forOrd$CommRich[i]
  absDen_forOrd$Conc_veronii[i] <- absDen_forOrd$veronii[i] * epsilon / absDen_forOrd$CommRich[i]
}
# re-order the levels of Heat for better plotting
absDen_forOrd$Heat <- factor(absDen_forOrd$Heat, levels=c("control", "6", "12", "24", "48"))
# finally, we can drop the total density column
absDen_forOrd <- absDen_forOrd %>% select(-Total_density)
rm(epsilon, missing_rows, i)

# first we have to widen the data:
# create a column that indicates the treatment day as resistance, early recovery, or late recovery
absDen_forOrd$trtmt_day <- "resist"
absDen_forOrd$trtmt_day[absDen_forOrd$Recov_Day == 1] <- "early_recov"
absDen_forOrd$trtmt_day[absDen_forOrd$Recov_Day == 2] <- "late_recov"
# ENTIRELY ARBITARARILY: I will keep days 1, 3, and 5 for control
absDen_forOrd$trtmt_day[absDen_forOrd$Heat == "control" & absDen_forOrd$Day == 3] <- "early_recov"
absDen_forOrd$trtmt_day[absDen_forOrd$Heat == "control" & absDen_forOrd$Day == 5] <- "late_recov"
# remove day 1 for 12h, 24h, 48h AND day 2 for 48h.
absDen_forOrd <- absDen_forOrd[!(absDen_forOrd$Heat == 12 & absDen_forOrd$Day == 1), ]
absDen_forOrd <- absDen_forOrd[!(absDen_forOrd$Heat == 24 & absDen_forOrd$Day == 1), ]
absDen_forOrd <- absDen_forOrd[!(absDen_forOrd$Heat == 48 & absDen_forOrd$Day == 1), ]
absDen_forOrd <- absDen_forOrd[!(absDen_forOrd$Heat == 48 & absDen_forOrd$Day == 2), ]
# also remove day 2 and day 4 for control.
absDen_forOrd <- absDen_forOrd[!(absDen_forOrd$Heat == "control" & absDen_forOrd$Day == 2), ]
absDen_forOrd <- absDen_forOrd[!(absDen_forOrd$Heat == "control" & absDen_forOrd$Day == 4), ]

# pivot wider to create a column for each of the 4 species on each of the 3 days
absDen_wide_forOrd <- absDen_forOrd %>% select(-Day, -Heat_Day, -Recov_Day) %>%
                          pivot_wider(names_from = trtmt_day,
                                      values_from = c(Conc_putida, Conc_protegens, Conc_grimontii, Conc_veronii))

# re-name the species abundance over time columns so they are shorter (again for better plotting)
colnames(absDen_wide_forOrd)[9:20] <- c("Pu_Resist", "Pu_earlyR", "Pu_lateR",
                                        "Pt_Resist", "Pt_earlyR", "Pt_lateR",
                                        "Gi_Resist", "Gi_earlyR", "Gi_lateR",
                                        "Vn_Resist", "Vn_earlyR", "Vn_lateR")

Now that we have the wide data, let’s calculate the distances and do the ordination with NMDS:

# The final result depends on the initial random placement of the points 
# set seed to make the results reproducible
set.seed(64576)

# keep just the species abundances
abundance_matrix <- as.matrix(absDen_wide_forOrd[,9:20])

# a function to automatically run the NMDS for k = 1 to 10 so we can choose appropriately small number of dimensions for ordination
NMDS.scree <- function(mat) { #where x is the abundance matrix
  data.frame(k = 1:10,
            # autotransform the data before calculating the bray-curtis dissimilarity
            stress = sapply(1:10, function(x) metaMDS(mat, distance = "bray", k = x, autotransform = TRUE)$stress))
}
scree_out <- NMDS.scree(abundance_matrix)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.2423303 
## Run 1 stress 0.2929796 
## Run 2 stress 0.2955769 
## Run 3 stress 0.2514977 
## Run 4 stress 0.2583327 
## Run 5 stress 0.2520582 
## Run 6 stress 0.2943351 
## Run 7 stress 0.2621376 
## Run 8 stress 0.3169983 
## Run 9 stress 0.2648957 
## Run 10 stress 0.3086427 
## Run 11 stress 0.3007618 
## Run 12 stress 0.2330685 
## ... New best solution
## ... Procrustes: rmse 0.03139865  max resid 0.1011646 
## Run 13 stress 0.3033715 
## Run 14 stress 0.258922 
## Run 15 stress 0.2490456 
## Run 16 stress 0.2491303 
## Run 17 stress 0.2540668 
## Run 18 stress 0.2583461 
## Run 19 stress 0.2690558 
## Run 20 stress 0.3103607 
## *** Best solution was not repeated -- monoMDS stopping criteria:
##      2: stress ratio > sratmax
##     18: scale factor of the gradient < sfgrmin
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.09939261 
## Run 1 stress 0.1740985 
## Run 2 stress 0.1425599 
## Run 3 stress 0.1526817 
## Run 4 stress 0.1063753 
## Run 5 stress 0.1412482 
## Run 6 stress 0.1629482 
## Run 7 stress 0.1121552 
## Run 8 stress 0.1304401 
## Run 9 stress 0.1423918 
## Run 10 stress 0.126843 
## Run 11 stress 0.1365722 
## Run 12 stress 0.1431993 
## Run 13 stress 0.1247585 
## Run 14 stress 0.1354528 
## Run 15 stress 0.09933652 
## ... New best solution
## ... Procrustes: rmse 0.001291752  max resid 0.0227284 
## Run 16 stress 0.09997366 
## Run 17 stress 0.1401592 
## Run 18 stress 0.1119758 
## Run 19 stress 0.1426391 
## Run 20 stress 0.1624407 
## *** Best solution was not repeated -- monoMDS stopping criteria:
##      9: stress ratio > sratmax
##     11: scale factor of the gradient < sfgrmin
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.05653678 
## Run 1 stress 0.05653715 
## ... Procrustes: rmse 0.0002103611  max resid 0.001156087 
## ... Similar to previous best
## Run 2 stress 0.05804981 
## Run 3 stress 0.05839031 
## Run 4 stress 0.05668292 
## ... Procrustes: rmse 0.01514321  max resid 0.04493888 
## Run 5 stress 0.05881388 
## Run 6 stress 0.05829128 
## Run 7 stress 0.05761679 
## Run 8 stress 0.05813329 
## Run 9 stress 0.05822053 
## Run 10 stress 0.05653658 
## ... New best solution
## ... Procrustes: rmse 0.0002146318  max resid 0.001125414 
## ... Similar to previous best
## Run 11 stress 0.05881759 
## Run 12 stress 0.05668246 
## ... Procrustes: rmse 0.01514773  max resid 0.04472723 
## Run 13 stress 0.05849046 
## Run 14 stress 0.05805011 
## Run 15 stress 0.0565368 
## ... Procrustes: rmse 0.0002165806  max resid 0.001150051 
## ... Similar to previous best
## Run 16 stress 0.05832403 
## Run 17 stress 0.05660328 
## ... Procrustes: rmse 0.0007965385  max resid 0.01407932 
## Run 18 stress 0.05741877 
## Run 19 stress 0.05867473 
## Run 20 stress 0.05660381 
## ... Procrustes: rmse 0.0008098417  max resid 0.01409199 
## *** Best solution repeated 2 times
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.03100195 
## Run 1 stress 0.03100886 
## ... Procrustes: rmse 0.001860282  max resid 0.01964811 
## Run 2 stress 0.03100655 
## ... Procrustes: rmse 0.001809436  max resid 0.01502119 
## Run 3 stress 0.03738253 
## Run 4 stress 0.03122818 
## ... Procrustes: rmse 0.007731228  max resid 0.04668352 
## Run 5 stress 0.03101757 
## ... Procrustes: rmse 0.002122981  max resid 0.01976892 
## Run 6 stress 0.03142072 
## ... Procrustes: rmse 0.003940614  max resid 0.02538769 
## Run 7 stress 0.03113716 
## ... Procrustes: rmse 0.004024748  max resid 0.02070315 
## Run 8 stress 0.03783273 
## Run 9 stress 0.0309996 
## ... New best solution
## ... Procrustes: rmse 0.001577177  max resid 0.01490436 
## Run 10 stress 0.03137454 
## ... Procrustes: rmse 0.01834442  max resid 0.04944586 
## Run 11 stress 0.0315096 
## Run 12 stress 0.03139354 
## ... Procrustes: rmse 0.01836707  max resid 0.04939437 
## Run 13 stress 0.03115586 
## ... Procrustes: rmse 0.00262926  max resid 0.01393858 
## Run 14 stress 0.03100132 
## ... Procrustes: rmse 0.0001637252  max resid 0.001250598 
## ... Similar to previous best
## Run 15 stress 0.03157253 
## Run 16 stress 0.03121361 
## ... Procrustes: rmse 0.007626543  max resid 0.04481529 
## Run 17 stress 0.03139407 
## ... Procrustes: rmse 0.01836436  max resid 0.049624 
## Run 18 stress 0.03102481 
## ... Procrustes: rmse 0.0008602552  max resid 0.002865027 
## ... Similar to previous best
## Run 19 stress 0.03100917 
## ... Procrustes: rmse 0.001200415  max resid 0.01856019 
## Run 20 stress 0.03123723 
## ... Procrustes: rmse 0.004080291  max resid 0.01130745 
## *** Best solution repeated 2 times
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.02267932 
## Run 1 stress 0.02285945 
## ... Procrustes: rmse 0.005318605  max resid 0.05163845 
## Run 2 stress 0.0230721 
## ... Procrustes: rmse 0.01454088  max resid 0.05888222 
## Run 3 stress 0.0228286 
## ... Procrustes: rmse 0.006325837  max resid 0.03319399 
## Run 4 stress 0.02309125 
## ... Procrustes: rmse 0.009797497  max resid 0.05369451 
## Run 5 stress 0.02349799 
## Run 6 stress 0.02307173 
## ... Procrustes: rmse 0.008578092  max resid 0.04838935 
## Run 7 stress 0.02353592 
## Run 8 stress 0.02272706 
## ... Procrustes: rmse 0.004663163  max resid 0.05220668 
## Run 9 stress 0.02319235 
## Run 10 stress 0.0228929 
## ... Procrustes: rmse 0.005067756  max resid 0.04936397 
## Run 11 stress 0.0226904 
## ... Procrustes: rmse 0.003821472  max resid 0.03055636 
## Run 12 stress 0.02313611 
## ... Procrustes: rmse 0.01495805  max resid 0.06110281 
## Run 13 stress 0.02331595 
## Run 14 stress 0.02314887 
## ... Procrustes: rmse 0.01368614  max resid 0.04752004 
## Run 15 stress 0.02379419 
## Run 16 stress 0.02279778 
## ... Procrustes: rmse 0.003705714  max resid 0.02500858 
## Run 17 stress 0.02268476 
## ... Procrustes: rmse 0.001134939  max resid 0.009911439 
## ... Similar to previous best
## Run 18 stress 0.02280568 
## ... Procrustes: rmse 0.002783335  max resid 0.02614025 
## Run 19 stress 0.0230307 
## ... Procrustes: rmse 0.01533704  max resid 0.05930438 
## Run 20 stress 0.02352092 
## *** Best solution repeated 1 times
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.01745044 
## Run 1 stress 0.01760697 
## ... Procrustes: rmse 0.005431853  max resid 0.05534993 
## Run 2 stress 0.01775461 
## ... Procrustes: rmse 0.01184043  max resid 0.05826288 
## Run 3 stress 0.01792373 
## ... Procrustes: rmse 0.004385319  max resid 0.02640242 
## Run 4 stress 0.01809884 
## Run 5 stress 0.01930814 
## Run 6 stress 0.01768445 
## ... Procrustes: rmse 0.01193104  max resid 0.06252274 
## Run 7 stress 0.01925184 
## Run 8 stress 0.01817458 
## Run 9 stress 0.01777304 
## ... Procrustes: rmse 0.009759783  max resid 0.07249081 
## Run 10 stress 0.01788223 
## ... Procrustes: rmse 0.008599536  max resid 0.06154664 
## Run 11 stress 0.01780884 
## ... Procrustes: rmse 0.0128207  max resid 0.06227581 
## Run 12 stress 0.01771042 
## ... Procrustes: rmse 0.01202047  max resid 0.06367507 
## Run 13 stress 0.01806552 
## Run 14 stress 0.01819434 
## Run 15 stress 0.01760785 
## ... Procrustes: rmse 0.008155502  max resid 0.07146103 
## Run 16 stress 0.01811591 
## Run 17 stress 0.01782993 
## ... Procrustes: rmse 0.01126684  max resid 0.06561554 
## Run 18 stress 0.01773849 
## ... Procrustes: rmse 0.01184637  max resid 0.05862861 
## Run 19 stress 0.01769879 
## ... Procrustes: rmse 0.01131141  max resid 0.05964816 
## Run 20 stress 0.01778223 
## ... Procrustes: rmse 0.01158636  max resid 0.05531058 
## *** Best solution was not repeated -- monoMDS stopping criteria:
##     20: no. of iterations >= maxit
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.01470232 
## Run 1 stress 0.01426383 
## ... New best solution
## ... Procrustes: rmse 0.0130729  max resid 0.08329273 
## Run 2 stress 0.01416691 
## ... New best solution
## ... Procrustes: rmse 0.005022937  max resid 0.03049654 
## Run 3 stress 0.01452055 
## ... Procrustes: rmse 0.01417726  max resid 0.08112892 
## Run 4 stress 0.01467882 
## Run 5 stress 0.01470182 
## Run 6 stress 0.01472892 
## Run 7 stress 0.01467751 
## Run 8 stress 0.0141736 
## ... Procrustes: rmse 0.01187823  max resid 0.0702224 
## Run 9 stress 0.01436686 
## ... Procrustes: rmse 0.01308783  max resid 0.08694689 
## Run 10 stress 0.01536665 
## Run 11 stress 0.01428692 
## ... Procrustes: rmse 0.009589543  max resid 0.09133483 
## Run 12 stress 0.01468029 
## Run 13 stress 0.01437372 
## ... Procrustes: rmse 0.01349171  max resid 0.06950188 
## Run 14 stress 0.01489049 
## Run 15 stress 0.01444754 
## ... Procrustes: rmse 0.0124289  max resid 0.06838532 
## Run 16 stress 0.01441679 
## ... Procrustes: rmse 0.0124974  max resid 0.06475054 
## Run 17 stress 0.01484764 
## Run 18 stress 0.01436579 
## ... Procrustes: rmse 0.01133914  max resid 0.04183535 
## Run 19 stress 0.01578763 
## Run 20 stress 0.01424384 
## ... Procrustes: rmse 0.008622062  max resid 0.09263191 
## *** Best solution was not repeated -- monoMDS stopping criteria:
##     20: no. of iterations >= maxit
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.01159271 
## Run 1 stress 0.01231995 
## Run 2 stress 0.01231675 
## Run 3 stress 0.0123291 
## Run 4 stress 0.0129412 
## Run 5 stress 0.01194928 
## ... Procrustes: rmse 0.01346614  max resid 0.06435481 
## Run 6 stress 0.01267622 
## Run 7 stress 0.01223814 
## Run 8 stress 0.01234434 
## Run 9 stress 0.0122798 
## Run 10 stress 0.01216852 
## Run 11 stress 0.01219052 
## Run 12 stress 0.01228166 
## Run 13 stress 0.01232808 
## Run 14 stress 0.01224609 
## Run 15 stress 0.0125952 
## Run 16 stress 0.01238599 
## Run 17 stress 0.01234447 
## Run 18 stress 0.01212564 
## Run 19 stress 0.0120421 
## ... Procrustes: rmse 0.007790014  max resid 0.03850533 
## Run 20 stress 0.01213426 
## *** Best solution was not repeated -- monoMDS stopping criteria:
##     20: no. of iterations >= maxit
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.01006807 
## Run 1 stress 0.0112219 
## Run 2 stress 0.01055157 
## ... Procrustes: rmse 0.01387935  max resid 0.09326126 
## Run 3 stress 0.01064219 
## Run 4 stress 0.01079995 
## Run 5 stress 0.01042679 
## ... Procrustes: rmse 0.009481296  max resid 0.0529539 
## Run 6 stress 0.01116961 
## Run 7 stress 0.01078218 
## Run 8 stress 0.01082747 
## Run 9 stress 0.01107196 
## Run 10 stress 0.01126544 
## Run 11 stress 0.01091599 
## Run 12 stress 0.01065413 
## Run 13 stress 0.01031789 
## ... Procrustes: rmse 0.008644761  max resid 0.04281108 
## Run 14 stress 0.01065736 
## Run 15 stress 0.01096792 
## Run 16 stress 0.01078167 
## Run 17 stress 0.01089812 
## Run 18 stress 0.01066146 
## Run 19 stress 0.01062589 
## Run 20 stress 0.01053113 
## ... Procrustes: rmse 0.01075643  max resid 0.04501416 
## *** Best solution was not repeated -- monoMDS stopping criteria:
##     20: no. of iterations >= maxit
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.009111295 
## Run 1 stress 0.009548517 
## ... Procrustes: rmse 0.01094137  max resid 0.03988506 
## Run 2 stress 0.009753849 
## Run 3 stress 0.01021714 
## Run 4 stress 0.009436933 
## ... Procrustes: rmse 0.009662144  max resid 0.02805097 
## Run 5 stress 0.009806361 
## Run 6 stress 0.009444433 
## ... Procrustes: rmse 0.01102185  max resid 0.05086678 
## Run 7 stress 0.009686076 
## Run 8 stress 0.009712265 
## Run 9 stress 0.009614569 
## Run 10 stress 0.009869996 
## Run 11 stress 0.009745525 
## Run 12 stress 0.009736319 
## Run 13 stress 0.009690711 
## Run 14 stress 0.009649312 
## Run 15 stress 0.009351891 
## ... Procrustes: rmse 0.008567265  max resid 0.03372346 
## Run 16 stress 0.009980753 
## Run 17 stress 0.00934312 
## ... Procrustes: rmse 0.0113435  max resid 0.04504891 
## Run 18 stress 0.009714768 
## Run 19 stress 0.009600306 
## ... Procrustes: rmse 0.01202499  max resid 0.05843118 
## Run 20 stress 0.009611901 
## *** Best solution was not repeated -- monoMDS stopping criteria:
##     20: no. of iterations >= maxit
plot(scree_out)

# k=3 looks great
try.NMDS <- metaMDS(abundance_matrix, distance = "bray", k = 3, autotransform = TRUE, trymax=100)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.05653678 
## Run 1 stress 0.05811525 
## Run 2 stress 0.05828934 
## Run 3 stress 0.0569203 
## ... Procrustes: rmse 0.01578648  max resid 0.0501326 
## Run 4 stress 0.0567028 
## ... Procrustes: rmse 0.01516076  max resid 0.04484438 
## Run 5 stress 0.05833604 
## Run 6 stress 0.05667964 
## ... Procrustes: rmse 0.01513649  max resid 0.04494314 
## Run 7 stress 0.05660351 
## ... Procrustes: rmse 0.0008001714  max resid 0.01397935 
## Run 8 stress 0.05828482 
## Run 9 stress 0.05667946 
## ... Procrustes: rmse 0.01513509  max resid 0.04488485 
## Run 10 stress 0.05668324 
## ... Procrustes: rmse 0.0151378  max resid 0.04487964 
## Run 11 stress 0.05834023 
## Run 12 stress 0.05725508 
## Run 13 stress 0.05653688 
## ... Procrustes: rmse 0.0001937895  max resid 0.001162355 
## ... Similar to previous best
## Run 14 stress 0.0565416 
## ... Procrustes: rmse 0.0004543333  max resid 0.004691431 
## ... Similar to previous best
## Run 15 stress 0.05811077 
## Run 16 stress 0.05653636 
## ... New best solution
## ... Procrustes: rmse 0.0001097333  max resid 0.001258089 
## ... Similar to previous best
## Run 17 stress 0.05653649 
## ... Procrustes: rmse 9.910708e-05  max resid 0.0007607637 
## ... Similar to previous best
## Run 18 stress 0.05828872 
## Run 19 stress 0.05713269 
## Run 20 stress 0.05689516 
## ... Procrustes: rmse 0.01582619  max resid 0.05162991 
## *** Best solution repeated 2 times
# check the stress value. It should be < 0.2, ideally even < 0.05. (But too low stress values can indicate too many 0 values)
try.NMDS$stress
## [1] 0.05653636
# let's get a general idea of what this NMDS is separating...

# plot the results for axis 1 & 2
ordiplot(try.NMDS, type = "n") # create blank ordination plot
orditorp(try.NMDS, display = "sites", cex = 0.5, air = 0.1) # add row numbers in black
orditorp(try.NMDS, display = "species", col="red", air = 0.1) # add species names in red

# plot the results for axis 1 & 3
ordiplot(try.NMDS, choices = c(1,3), type = "n") # create blank ordination plot
orditorp(try.NMDS, choices = c(1,3), display = "sites", cex = 0.5, air = 0.1) # add row numbers in black
orditorp(try.NMDS, choices = c(1,3), display = "species", col="red", air = 0.1) # add species names in red

# plot the results for axis 2 & 3
ordiplot(try.NMDS, choices = c(2,3), type = "n") # create blank ordination plot
orditorp(try.NMDS, choices = c(2,3), display = "sites", cex = 0.5, air = 0.1) # add row numbers in black
orditorp(try.NMDS, choices = c(2,3), display = "species", col="red", air = 0.1) # add species names in red

# we already know that presense/absence of protegens is consistently the most important thing for all communities so let's see if that shows up here.
# Let's switch over to ggplot to be certain that everything is labelled correctly.

# define a function (related to vegan) that finds coordinates for drawing a covariance ellipse
  # CREDIT: THIS COMES FROM ONE OF THE TUTORIALS ABOVE!!!
veganCovEllipse <- function (cov, center = c(0, 0), scale = 1, npoints = 100) {
  theta <- (0:npoints) * 2 * pi/npoints
  Circle <- cbind(cos(theta), sin(theta))
  t(center + scale * t(Circle %*% chol(cov)))
  # finds the centroids and dispersion of the different ellipses based on a grouping factor of your choice
}

nmds_for_ggplot <- cbind(absDen_wide_forOrd[,1:8],
                         as.data.frame(scores(try.NMDS)$sites))
# create a new factor that defines the combination of heat and protegens
nmds_for_ggplot <- nmds_for_ggplot %>% unite("HeatxProtegens", c(Heat, protegens), remove = FALSE)
nmds_for_ggplot$HeatxProtegens <- factor(nmds_for_ggplot$HeatxProtegens,
                                         levels = c("6_0", "6_1", "12_0", "12_1", "24_0", "24_1", "48_0", "48_1", "control_0", "control_1"))



# create empty dataframes to combine NMDS data with ellipse data
ellipse12_df <- ellipse13_df <- ellipse23_df <- data.frame() # numbers indicate the ordination axes
# adding data for ellipses, using HeatxProtegens as a grouping factor
for(g in levels(nmds_for_ggplot$HeatxProtegens)){
  ellipse12_df <- rbind(ellipse12_df, cbind(as.data.frame(with(nmds_for_ggplot[nmds_for_ggplot$HeatxProtegens==g,],
                                                     veganCovEllipse(cov.wt(cbind(NMDS1, NMDS2),
                                                                            wt=rep(1/length(NMDS1),length(NMDS1)))$cov,
                                                                     center=c(mean(NMDS1),mean(NMDS2)))))
                                  , HeatxProtegens=g))
  ellipse13_df <- rbind(ellipse13_df, cbind(as.data.frame(with(nmds_for_ggplot[nmds_for_ggplot$HeatxProtegens==g,],
                                                     veganCovEllipse(cov.wt(cbind(NMDS1, NMDS3),
                                                                            wt=rep(1/length(NMDS1),length(NMDS1)))$cov,
                                                                     center=c(mean(NMDS1),mean(NMDS3)))))
                                  , HeatxProtegens=g))
  ellipse23_df <- rbind(ellipse23_df, cbind(as.data.frame(with(nmds_for_ggplot[nmds_for_ggplot$HeatxProtegens==g,],
                                                     veganCovEllipse(cov.wt(cbind(NMDS2, NMDS3),
                                                                            wt=rep(1/length(NMDS2),length(NMDS2)))$cov,
                                                                     center=c(mean(NMDS2),mean(NMDS3)))))
                                  , HeatxProtegens=g))
}

# now we separate the HeatxProtegens columns:
ellipse12_df <- ellipse12_df %>% separate(HeatxProtegens, c("Heat", "protegens"))
ellipse12_df$Heat <- factor(ellipse12_df$Heat, levels = levels(nmds_for_ggplot$Heat))
ellipse13_df <- ellipse13_df %>% separate(HeatxProtegens, c("Heat", "protegens"))
ellipse13_df$Heat <- factor(ellipse13_df$Heat, levels = levels(nmds_for_ggplot$Heat))
ellipse23_df <- ellipse23_df %>% separate(HeatxProtegens, c("Heat", "protegens"))
ellipse23_df$Heat <- factor(ellipse23_df$Heat, levels = levels(nmds_for_ggplot$Heat))

nmds_for_ggplot$protegens <- as.character(nmds_for_ggplot$protegens) # this needs to be discrete (could also be a factor)

# and finally we can make the plots:
ggplot(data = nmds_for_ggplot, aes(NMDS1, NMDS2)) +
    geom_point(aes(color = Heat, shape = protegens), alpha=0.4) + # adding different colours and shapes for points at different distances
    geom_path(data=ellipse12_df, aes(x=NMDS1, y=NMDS2, colour=Heat, linetype=protegens), linewidth=1) + # adding covariance ellipses according to distance # use size argument if ggplot2 < v. 3.4.0
    guides(color = guide_legend(override.aes = list(linetype=rep(NA,5)))) + # removes lines from colour part of the legend
    scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
    fave_theme + # not sure why I need this but I do to over-write the default grey theme
    labs(title="NMDS of all data (4sp & 3 time-points)")

# axes 1 & 2 again showing just the ellipses (bc it's hard to see protegens effects as it's so overlapped)
ggplot(data = nmds_for_ggplot, aes(NMDS1, NMDS2)) +
    geom_path(data=ellipse12_df, aes(x=NMDS1, y=NMDS2, colour=Heat, linetype=protegens), linewidth=1) + # plot just the ellipses
    scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
    fave_theme + # not sure why I need this but I do to over-write the default grey theme
    labs(title="NMDS of all data (4sp & 3 time-points)")

ggplot(data = nmds_for_ggplot, aes(NMDS1, NMDS3)) +
    geom_point(aes(color = Heat, shape = protegens), alpha=0.4) + 
    geom_path(data=ellipse13_df, aes(x=NMDS1, y=NMDS3, colour=Heat, linetype=protegens), linewidth=1) + 
    guides(color = guide_legend(override.aes = list(linetype=rep(NA,5)))) + 
    scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
    fave_theme + 
    labs(title="NMDS of all data (4sp & 3 time-points)")

# axes 1 & 3 again showing just the ellipses
ggplot(data = nmds_for_ggplot, aes(NMDS1, NMDS3)) +
    geom_path(data=ellipse13_df, aes(x=NMDS1, y=NMDS3, colour=Heat, linetype=protegens), linewidth=1) + 
    scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
    fave_theme + 
    labs(title="NMDS of all data (4sp & 3 time-points)")

ggplot(data = nmds_for_ggplot, aes(NMDS2, NMDS3)) +
    geom_point(aes(color = Heat, shape = protegens), alpha=0.4) + 
    geom_path(data=ellipse23_df, aes(x=NMDS2, y=NMDS3, colour=Heat, linetype=protegens), linewidth=1) + 
    guides(color = guide_legend(override.aes = list(linetype=rep(NA,5)))) + 
    scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
    fave_theme + 
    labs(title="NMDS of all data (4sp & 3 time-points)")

# axes 2 & 3 again showing just the ellipses (bc it's hard to see protegens effects as it's so overlapped)
ggplot(data = nmds_for_ggplot, aes(NMDS2, NMDS3)) +
    geom_path(data=ellipse23_df, aes(x=NMDS2, y=NMDS3, colour=Heat, linetype=protegens), linewidth=1) + 
    scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
    fave_theme + 
    labs(title="NMDS of all data (4sp & 3 time-points)")

################
# check significance:
# using a PERMANOVA to test the differences in community composition
# This is a PERmutational Multivariate ANalysis Of VAriance and tests the differences between groups, like an ANOVA, but with lots of variables.
# it is essentially a multivariate analysis of variance used to compare groups of objects
nmdsdata_test_Heat <- adonis2(abundance_matrix ~ Heat, absDen_wide_forOrd,
                              permutations = 999, method = "bray")
print(nmdsdata_test_Heat)
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = abundance_matrix ~ Heat, data = absDen_wide_forOrd, permutations = 999, method = "bray")
##           Df SumOfSqs      R2      F Pr(>F)    
## Model      4    8.188 0.07534 6.4981  0.001 ***
## Residual 319  100.485 0.92466                  
## Total    323  108.673 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
nmdsdata_test_Prot <- adonis2(abundance_matrix ~ protegens, absDen_wide_forOrd,
                              permutations = 999, method = "bray")
print(nmdsdata_test_Prot)
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = abundance_matrix ~ protegens, data = absDen_wide_forOrd, permutations = 999, method = "bray")
##           Df SumOfSqs      R2      F Pr(>F)    
## Model      1   45.211 0.41603 229.39  0.001 ***
## Residual 322   63.462 0.58397                  
## Total    323  108.673 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
nmdsdata_test_HeatxProt <- adonis2(abundance_matrix ~ Heat * protegens, absDen_wide_forOrd,
                                   permutations = 999, method = "bray")
print(nmdsdata_test_HeatxProt)
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = abundance_matrix ~ Heat * protegens, data = absDen_wide_forOrd, permutations = 999, method = "bray")
##           Df SumOfSqs      R2      F Pr(>F)    
## Model      9   60.966 0.56101 44.586  0.001 ***
## Residual 314   47.706 0.43899                  
## Total    323  108.673 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# so these are all significant but is that spurious because the dispersion is different btw groups? (e.g., much smaller for protegens)

##############
# check PERMANOVA assumption of homogeneous group variances
# Bray-curtis distance matrix
dist_mat <- vegdist(abundance_matrix, method = "bray")

# use betadisper test to check for multivariate homogeneity of group variances
dispersion <- betadisper(dist_mat, group = paste(absDen_wide_forOrd$Heat, absDen_wide_forOrd$protegens))
## Warning in betadisper(dist_mat, group = paste(absDen_wide_forOrd$Heat,
## absDen_wide_forOrd$protegens)): some squared distances are negative and changed
## to zero
permutest(dispersion)
## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
## 
## Response: Distances
##            Df  Sum Sq Mean Sq      F N.Perm Pr(>F)    
## Groups      9  8.7974 0.97749 23.077    999  0.001 ***
## Residuals 314 13.3006 0.04236                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# yeap! We need to try a different test that is robust to heterogenous group variances...

################
# check significance:
# let's test for significance again using ANOSIM (which is another non-parametric test but this time only considering the ranks)
nmdsdata_test2_HeatxProt <- anosim(dist_mat,
                                   grouping = paste(absDen_wide_forOrd$Heat, absDen_wide_forOrd$protegens),
                                   permutations = 999)
plot(nmdsdata_test2_HeatxProt)
## Warning in (function (z, notch = FALSE, width = NULL, varwidth = FALSE, : some
## notches went outside hinges ('box'): maybe set notch=FALSE

summary(nmdsdata_test2_HeatxProt)
## 
## Call:
## anosim(x = dist_mat, grouping = paste(absDen_wide_forOrd$Heat,      absDen_wide_forOrd$protegens), permutations = 999) 
## Dissimilarity: bray 
## 
## ANOSIM statistic R: 0.6212 
##       Significance: 0.001 
## 
## Permutation: free
## Number of permutations: 999
## 
## Upper quantiles of permutations (null model):
##    90%    95%  97.5%    99% 
## 0.0110 0.0150 0.0173 0.0204 
## 
## Dissimilarity ranks between and within classes:
##              0%      25%     50%     75%    100%     N
## Between    15.5 15656.00 28220.0 44062.5 44062.5 47065
## 12 0      156.0  4841.00 22156.0 25514.0 44062.5   253
## 12 1       37.0  2073.25  6635.0 10519.0 18965.0   780
## 24 0       15.5  6506.00 22860.0 44062.5 44062.5   465
## 24 1      102.0  3052.00  5746.0  9735.5 17602.0   595
## 48 0       15.5 15176.00 26611.5 44062.5 44062.5   406
## 48 1       33.0  1305.00  3454.0  8128.5 20766.0   595
## 6 0        88.0  6343.00 21367.0 44062.5 44062.5   561
## 6 1        35.0  1550.50  3749.0  6333.5 19718.0   780
## control 0 685.0  8820.00 20181.0 20994.5 44062.5   231
## control 1  32.0   729.00  1856.0  4199.5 17322.0   595

Great!, This summarizes the same result that I found with the other indices: presence of P.protegens is the most important thing. Communities where this species was present look quite similar across different heat treatments. Longer heat durations push the communities toward different direction, until a threshold is reached at the longest heat treatment (48h).

The NMDS ordination results are significant by PERMANOVA but the assumptions of that test might be violated because the dispersal is heterogeneous between groups. I think ANOSIM should be somewhat more robust to this problem because it uses ranks. The NMDS ordination results are significant by ANOSIM.

################################
# Plot figure for main text: Figure 3b
################################
# change protegens values for better plotting
nmds_for_ggplot$P_protegens <- "absent"
nmds_for_ggplot$P_protegens[nmds_for_ggplot$protegens == 1] <- "present"

ellipse12_df$P_protegens <- "absent"
ellipse12_df$P_protegens[ellipse12_df$protegens == 1] <- "present"

ellipse13_df$P_protegens <- "absent"
ellipse13_df$P_protegens[ellipse13_df$protegens == 1] <- "present"

# change Heat values for better plotting
levels(nmds_for_ggplot$Heat)[2:5] <- paste(levels(nmds_for_ggplot$Heat)[2:5], "hrs")
levels(ellipse12_df$Heat)[2:5] <- paste(levels(ellipse12_df$Heat)[2:5], "hrs")
levels(ellipse13_df$Heat)[2:5] <- paste(levels(ellipse13_df$Heat)[2:5], "hrs")

# create the plot of 1 vs 2:
plot1_2 <- ggplot(data = nmds_for_ggplot, aes(NMDS1, NMDS2)) +
            geom_point(aes(color = Heat, shape = P_protegens), size=2, alpha=0.4) + # adding different colours and shapes for points at different distances
            geom_path(data=ellipse12_df, aes(x=NMDS1, y=NMDS2, colour=Heat, linetype=P_protegens), linewidth=1) + # adding covariance ellipses according to distance # use size argument if ggplot2 < v. 3.4.0
            guides(color = guide_legend(override.aes = list(linetype=rep(NA,5),# removes lines from colour part of the legend
                                                  alpha=1, size=3)), # make the points opaque and bigger in the colour part of the legend
            shape = guide_legend(override.aes = list(size=3))) + # make the points bigger in the greyscale part of the legend
            scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9)

# plot 1 vs 2 with the legend ... I will extract the legend from here
png(filename="./figures/Fig3_A_legend.png", width = 3.48, height = 3.41, units = "in", res=300)
print(plot1_2)
dev.off()
## png 
##   2
# plot 1 vs 2 without the legend
png(filename="./figures/Fig3_A_axis1vs2.png", width = 5.35, height = 3.78, units = "in", res=300)
print(plot1_2 + theme(legend.position="none"))
dev.off()
## png 
##   2
# plot 1 vs 3 without the legend
png(filename="./figures/Fig3_A_axis1vs3.png", width = 5.35, height = 3.78, units = "in", res=300)

ggplot(data = nmds_for_ggplot, aes(NMDS1, NMDS3)) +
    geom_point(aes(color = Heat, shape = P_protegens), size=2, alpha=0.4) + 
    geom_path(data=ellipse13_df, aes(x=NMDS1, y=NMDS3, colour=Heat, linetype=P_protegens), linewidth=1) + 
    scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
    theme(legend.position="none")

dev.off()
## png 
##   2

Gradient with 48h

# let's check if the gradient of heat pulse duration is significant:
absDen_wide_forOrd$Heat <- as.character(levels(absDen_wide_forOrd$Heat))[absDen_wide_forOrd$Heat]
absDen_wide_forOrd$Heat[absDen_wide_forOrd$Heat == "control"] <- 0
absDen_wide_forOrd$Heat <- as.numeric(absDen_wide_forOrd$Heat)

# let's see what the heat gradient looks like
gg_ordiplot(try.NMDS, groups = absDen_wide_forOrd$protegens, plot = TRUE)

# here's another way to do it:
gg_envfit(try.NMDS, env = absDen_wide_forOrd$Heat, groups = absDen_wide_forOrd$protegens, plot = TRUE, alpha=0.5) # notice this gradient is not significant!!!

gg_envfit(try.NMDS, env = absDen_wide_forOrd$Heat, groups = absDen_wide_forOrd$protegens, plot = TRUE, alpha=0.5, choices=c(1,3)) # notice this gradient is not significant!!!

gg_envfit(try.NMDS, env = absDen_wide_forOrd$Heat, groups = absDen_wide_forOrd$protegens, plot = TRUE, alpha=0.5, choices=c(2,3)) # notice this gradient is not significant!!!

# display the p-value:
gg_envfit(try.NMDS, env = absDen_wide_forOrd$Heat, groups = absDen_wide_forOrd$protegens, alpha=0.5, plot = FALSE)$df_arrows$p.val
## [1] 0.295

The gradient is NOT significant.

Gradient without 48h

Let’s re-do the analysis this time removing the 48h duration in order to check if the gradient becomes significant.

# exclude 48h duration data
absDen_wide_forOrd_no48 <- absDen_wide_forOrd %>% filter(Heat < 48)
abundance_mat_no48 <- as.matrix(absDen_wide_forOrd_no48[,9:20])
# re-do the NMDS with k=3
NMDS_no48 <- metaMDS(abundance_mat_no48, distance = "bray", k = 3, autotransform = TRUE, trymax=100)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.03610698 
## Run 1 stress 0.0361706 
## ... Procrustes: rmse 0.004523705  max resid 0.0209621 
## Run 2 stress 0.03587647 
## ... New best solution
## ... Procrustes: rmse 0.008280525  max resid 0.03289523 
## Run 3 stress 0.03586422 
## ... New best solution
## ... Procrustes: rmse 0.009772168  max resid 0.03282395 
## Run 4 stress 0.0361075 
## ... Procrustes: rmse 0.0049532  max resid 0.02134297 
## Run 5 stress 0.0361076 
## ... Procrustes: rmse 0.004952867  max resid 0.02135668 
## Run 6 stress 0.03632809 
## ... Procrustes: rmse 0.009324124  max resid 0.03207945 
## Run 7 stress 0.03792442 
## Run 8 stress 0.03587761 
## ... Procrustes: rmse 0.009770886  max resid 0.03279242 
## Run 9 stress 0.0361707 
## ... Procrustes: rmse 0.001574116  max resid 0.02088598 
## Run 10 stress 0.04846144 
## Run 11 stress 0.03610699 
## ... Procrustes: rmse 0.004969727  max resid 0.02132521 
## Run 12 stress 0.03624383 
## ... Procrustes: rmse 0.008163905  max resid 0.03274627 
## Run 13 stress 0.03656112 
## Run 14 stress 0.04613172 
## Run 15 stress 0.0363021 
## ... Procrustes: rmse 0.00152127  max resid 0.02025857 
## Run 16 stress 0.03610761 
## ... Procrustes: rmse 0.004952837  max resid 0.02135751 
## Run 17 stress 0.03617057 
## ... Procrustes: rmse 0.001576836  max resid 0.02086885 
## Run 18 stress 0.03585721 
## ... New best solution
## ... Procrustes: rmse 0.005268631  max resid 0.02116026 
## Run 19 stress 0.03595363 
## ... Procrustes: rmse 0.00973805  max resid 0.03298728 
## Run 20 stress 0.03775638 
## Run 21 stress 0.03776485 
## Run 22 stress 0.03587669 
## ... Procrustes: rmse 0.008254363  max resid 0.03305972 
## Run 23 stress 0.04624875 
## Run 24 stress 0.03587662 
## ... Procrustes: rmse 0.008228895  max resid 0.03289349 
## Run 25 stress 0.03607472 
## ... Procrustes: rmse 0.004978904  max resid 0.02099788 
## Run 26 stress 0.03607386 
## ... Procrustes: rmse 0.004984494  max resid 0.02107825 
## Run 27 stress 0.03617053 
## ... Procrustes: rmse 0.004835244  max resid 0.02103437 
## Run 28 stress 0.03820177 
## Run 29 stress 0.03607535 
## ... Procrustes: rmse 0.004969459  max resid 0.02114723 
## Run 30 stress 0.04853968 
## Run 31 stress 0.03784573 
## Run 32 stress 0.03607467 
## ... Procrustes: rmse 0.004986658  max resid 0.02111265 
## Run 33 stress 0.03586417 
## ... Procrustes: rmse 0.005263557  max resid 0.02144332 
## Run 34 stress 0.0363281 
## ... Procrustes: rmse 0.008242741  max resid 0.03229225 
## Run 35 stress 0.03595435 
## ... Procrustes: rmse 0.009738699  max resid 0.03296331 
## Run 36 stress 0.03645808 
## Run 37 stress 0.03595357 
## ... Procrustes: rmse 0.009745052  max resid 0.03299968 
## Run 38 stress 0.03610661 
## ... Procrustes: rmse 0.00135609  max resid 0.01935981 
## Run 39 stress 0.0363037 
## ... Procrustes: rmse 0.004650823  max resid 0.02069567 
## Run 40 stress 0.03587627 
## ... Procrustes: rmse 0.008237771  max resid 0.03288605 
## Run 41 stress 0.03624235 
## ... Procrustes: rmse 0.009515786  max resid 0.03278193 
## Run 42 stress 0.03609896 
## ... Procrustes: rmse 0.008214127  max resid 0.03256114 
## Run 43 stress 0.03595322 
## ... Procrustes: rmse 0.009745371  max resid 0.03298626 
## Run 44 stress 0.03610723 
## ... Procrustes: rmse 0.001331733  max resid 0.01934939 
## Run 45 stress 0.03587669 
## ... Procrustes: rmse 0.008254437  max resid 0.03306022 
## Run 46 stress 0.03617061 
## ... Procrustes: rmse 0.004865983  max resid 0.0210081 
## Run 47 stress 0.03586504 
## ... Procrustes: rmse 0.005265935  max resid 0.02146238 
## Run 48 stress 0.03617104 
## ... Procrustes: rmse 0.004845602  max resid 0.02091001 
## Run 49 stress 0.03586405 
## ... Procrustes: rmse 0.005264017  max resid 0.0214265 
## Run 50 stress 0.04703046 
## Run 51 stress 0.0362726 
## ... Procrustes: rmse 0.009413914  max resid 0.03248227 
## Run 52 stress 0.03607465 
## ... Procrustes: rmse 0.004966884  max resid 0.02113944 
## Run 53 stress 0.03610716 
## ... Procrustes: rmse 0.001328212  max resid 0.0193493 
## Run 54 stress 0.03586437 
## ... Procrustes: rmse 0.00527586  max resid 0.02152297 
## Run 55 stress 0.03631954 
## ... Procrustes: rmse 0.005103146  max resid 0.02342217 
## Run 56 stress 0.05410207 
## Run 57 stress 0.03586391 
## ... Procrustes: rmse 0.005277148  max resid 0.02147091 
## Run 58 stress 0.03607545 
## ... Procrustes: rmse 0.0049688  max resid 0.02116238 
## Run 59 stress 0.03586361 
## ... Procrustes: rmse 0.005278219  max resid 0.02142871 
## Run 60 stress 0.03608585 
## ... Procrustes: rmse 0.008242746  max resid 0.03287899 
## Run 61 stress 0.03587691 
## ... Procrustes: rmse 0.008255639  max resid 0.03306867 
## Run 62 stress 0.0360749 
## ... Procrustes: rmse 0.004984373  max resid 0.02114165 
## Run 63 stress 0.03586413 
## ... Procrustes: rmse 0.005263647  max resid 0.02143811 
## Run 64 stress 0.03610729 
## ... Procrustes: rmse 0.001402614  max resid 0.01934451 
## Run 65 stress 0.03610706 
## ... Procrustes: rmse 0.00139432  max resid 0.01934721 
## Run 66 stress 0.03667862 
## Run 67 stress 0.03827359 
## Run 68 stress 0.03585661 
## ... New best solution
## ... Procrustes: rmse 0.0001460725  max resid 0.0004428963 
## ... Similar to previous best
## *** Best solution repeated 1 times
# check the stress value. For the data with 48h it was 0.05653636
NMDS_no48$stress # it's smaller than before
## [1] 0.03585661
# let's get a general idea of what this NMDS is separating...
# plot the results for axis 1 & 2
ordiplot(NMDS_no48, type = "n") # create blank ordination plot
orditorp(NMDS_no48, display = "sites", cex = 0.5, air = 0.1) # add row numbers in black
orditorp(NMDS_no48, display = "species", col="red", air = 0.1) # add species names in red

# plot the results for axis 1 & 3
ordiplot(NMDS_no48, choices = c(1,3), type = "n") # create blank ordination plot
orditorp(NMDS_no48, choices = c(1,3), display = "sites", cex = 0.5, air = 0.1) # add row numbers in black
orditorp(NMDS_no48, choices = c(1,3), display = "species", col="red", air = 0.1) # add species names in red

# plot the results for axis 2 & 3
ordiplot(NMDS_no48, choices = c(2,3), type = "n") # create blank ordination plot
orditorp(NMDS_no48, choices = c(2,3), display = "sites", cex = 0.5, air = 0.1) # add row numbers in black
orditorp(NMDS_no48, choices = c(2,3), display = "species", col="red", air = 0.1) # add species names in red

# test for significance
gg_envfit(NMDS_no48, env = absDen_wide_forOrd_no48$Heat, groups = absDen_wide_forOrd_no48$protegens, plot = TRUE, alpha=0.5)

gg_envfit(NMDS_no48, env = absDen_wide_forOrd_no48$Heat, groups = absDen_wide_forOrd_no48$protegens, plot = TRUE, alpha=0.5)$df_arrows$p.val

## [1] 0.224

The NMDS looks similar to when the 48h heat pulse data was included. But it’s not exactly the same, as is to be expected. The gradient is NOT significant when we remove all of the 48h heat pulse duration data.

Gradient with 48h protegens but none of the 48h others

# exclude 48h duration data
absDen_wide_forOrd_no48 <- rbind(absDen_wide_forOrd %>% filter(Heat < 48),
                                 absDen_wide_forOrd %>% filter(Heat == 48, protegens == 1))
abundance_mat_no48 <- as.matrix(absDen_wide_forOrd_no48[,9:20])
# re-do the NMDS with k=3
NMDS_no48 <- metaMDS(abundance_mat_no48, distance = "bray", k = 3, autotransform = TRUE, trymax=100)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.03706032 
## Run 1 stress 0.04547765 
## Run 2 stress 0.05064102 
## Run 3 stress 0.03716541 
## ... Procrustes: rmse 0.01666652  max resid 0.05646784 
## Run 4 stress 0.03703048 
## ... New best solution
## ... Procrustes: rmse 0.01560765  max resid 0.04445792 
## Run 5 stress 0.03703069 
## ... Procrustes: rmse 0.0002904334  max resid 0.004232298 
## ... Similar to previous best
## Run 6 stress 0.03719355 
## ... Procrustes: rmse 0.0008520842  max resid 0.01321004 
## Run 7 stress 0.03716485 
## ... Procrustes: rmse 0.006032667  max resid 0.03788253 
## Run 8 stress 0.03719313 
## ... Procrustes: rmse 0.0007887533  max resid 0.01320485 
## Run 9 stress 0.03707538 
## ... Procrustes: rmse 0.01557246  max resid 0.04406662 
## Run 10 stress 0.05222347 
## Run 11 stress 0.03719385 
## ... Procrustes: rmse 0.0007832657  max resid 0.01322475 
## Run 12 stress 0.0370302 
## ... New best solution
## ... Procrustes: rmse 0.0003310697  max resid 0.004292492 
## ... Similar to previous best
## Run 13 stress 0.03758154 
## Run 14 stress 0.03706025 
## ... Procrustes: rmse 0.01560889  max resid 0.04371931 
## Run 15 stress 0.03706038 
## ... Procrustes: rmse 0.01560757  max resid 0.0434899 
## Run 16 stress 0.03738869 
## ... Procrustes: rmse 0.005864962  max resid 0.03818162 
## Run 17 stress 0.03716565 
## ... Procrustes: rmse 0.006034456  max resid 0.03822598 
## Run 18 stress 0.05054459 
## Run 19 stress 0.0372091 
## ... Procrustes: rmse 0.01559203  max resid 0.0438836 
## Run 20 stress 0.04326073 
## *** Best solution repeated 1 times
# check the stress value. For the data with 48h it was 0.05653636
NMDS_no48$stress # it's smaller than before
## [1] 0.0370302
# let's get a general idea of what this NMDS is separating...
# plot the results for axis 1 & 2
ordiplot(NMDS_no48, type = "n") # create blank ordination plot
orditorp(NMDS_no48, display = "sites", cex = 0.5, air = 0.1) # add row numbers in black
orditorp(NMDS_no48, display = "species", col="red", air = 0.1) # add species names in red

# plot the results for axis 1 & 3
ordiplot(NMDS_no48, choices = c(1,3), type = "n") # create blank ordination plot
orditorp(NMDS_no48, choices = c(1,3), display = "sites", cex = 0.5, air = 0.1) # add row numbers in black
orditorp(NMDS_no48, choices = c(1,3), display = "species", col="red", air = 0.1) # add species names in red

# plot the results for axis 2 & 3
ordiplot(NMDS_no48, choices = c(2,3), type = "n") # create blank ordination plot
orditorp(NMDS_no48, choices = c(2,3), display = "sites", cex = 0.5, air = 0.1) # add row numbers in black
orditorp(NMDS_no48, choices = c(2,3), display = "species", col="red", air = 0.1) # add species names in red

# test for significance of environmental vector:
gg_envfit(NMDS_no48, env = absDen_wide_forOrd_no48$Heat, groups = absDen_wide_forOrd_no48$protegens, plot = TRUE)

gg_envfit(NMDS_no48, env = absDen_wide_forOrd_no48$Heat, groups = absDen_wide_forOrd_no48$protegens, plot = TRUE, choices=c(1,3))

gg_envfit(NMDS_no48, env = absDen_wide_forOrd_no48$Heat, groups = absDen_wide_forOrd_no48$protegens, plot = TRUE, choices=c(2,3))

# print out the p-value
gg_envfit(NMDS_no48, env = absDen_wide_forOrd_no48$Heat, groups = absDen_wide_forOrd_no48$protegens, plot = FALSE, alpha=0.5)$df_arrows$p.val
## [1] 0.001
# okay now that we see this is significant, let's make a pretty plot of the NMDS to include in the supplement
  # first we have to return Heat to a factor with appropriate levels
absDen_wide_forOrd_no48$Heat <- factor(absDen_wide_forOrd_no48$Heat,
                                       levels = c(0, 6, 12, 24, 48))
levels(absDen_wide_forOrd_no48$Heat)[1] <- "control"


nmds_for_ggplot <- cbind(absDen_wide_forOrd_no48[,1:8],
                         as.data.frame(scores(NMDS_no48)$sites))

# create a new factor that defines the combination of heat and protegens
nmds_for_ggplot <- nmds_for_ggplot %>% unite("HeatxProtegens", c(Heat, protegens), remove = FALSE)
nmds_for_ggplot$HeatxProtegens <- factor(nmds_for_ggplot$HeatxProtegens,
                                         levels = c("6_0", "6_1", "12_0", "12_1", "24_0", "24_1", "48_1", "control_0", "control_1"))



# create empty dataframes to combine NMDS data with ellipse data
ellipse12_df <- ellipse13_df <- ellipse23_df <- data.frame() # numbers indicate the ordination axes
# adding data for ellipses, using HeatxProtegens as a grouping factor
for(g in levels(nmds_for_ggplot$HeatxProtegens)){
  ellipse12_df <- rbind(ellipse12_df, cbind(as.data.frame(with(nmds_for_ggplot[nmds_for_ggplot$HeatxProtegens==g,],
                                                     veganCovEllipse(cov.wt(cbind(NMDS1, NMDS2),
                                                                            wt=rep(1/length(NMDS1),length(NMDS1)))$cov,
                                                                     center=c(mean(NMDS1),mean(NMDS2)))))
                                  , HeatxProtegens=g))
  ellipse13_df <- rbind(ellipse13_df, cbind(as.data.frame(with(nmds_for_ggplot[nmds_for_ggplot$HeatxProtegens==g,],
                                                     veganCovEllipse(cov.wt(cbind(NMDS1, NMDS3),
                                                                            wt=rep(1/length(NMDS1),length(NMDS1)))$cov,
                                                                     center=c(mean(NMDS1),mean(NMDS3)))))
                                  , HeatxProtegens=g))
  ellipse23_df <- rbind(ellipse23_df, cbind(as.data.frame(with(nmds_for_ggplot[nmds_for_ggplot$HeatxProtegens==g,],
                                                     veganCovEllipse(cov.wt(cbind(NMDS2, NMDS3),
                                                                            wt=rep(1/length(NMDS2),length(NMDS2)))$cov,
                                                                     center=c(mean(NMDS2),mean(NMDS3)))))
                                  , HeatxProtegens=g))
}

# now we separate the HeatxProtegens columns:
ellipse12_df <- ellipse12_df %>% separate(HeatxProtegens, c("Heat", "protegens"))
ellipse12_df$Heat <- factor(ellipse12_df$Heat,
                            levels(nmds_for_ggplot$Heat))
ellipse13_df <- ellipse13_df %>% separate(HeatxProtegens, c("Heat", "protegens"))
ellipse13_df$Heat <- factor(ellipse13_df$Heat, levels = levels(nmds_for_ggplot$Heat))
ellipse23_df <- ellipse23_df %>% separate(HeatxProtegens, c("Heat", "protegens"))
ellipse23_df$Heat <- factor(ellipse23_df$Heat, levels = levels(nmds_for_ggplot$Heat))

nmds_for_ggplot$protegens <- as.character(nmds_for_ggplot$protegens) # this needs to be discrete (could also be a factor)

# and finally we can make the plots:
ggplot(data = nmds_for_ggplot, aes(NMDS1, NMDS2)) +
    geom_point(aes(color = Heat, shape = protegens), alpha=0.4) + # adding different colours and shapes for points at different distances
    geom_path(data=ellipse12_df, aes(x=NMDS1, y=NMDS2, colour=Heat, linetype=protegens), linewidth=1) + # adding covariance ellipses according to distance # use size argument if ggplot2 < v. 3.4.0
    guides(color = guide_legend(override.aes = list(linetype=rep(NA,5)))) + # removes lines from colour part of the legend
    scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
    fave_theme + # not sure why I need this but I do to over-write the default grey theme
    labs(title="exlude: 48h without protegens")

# axis 1 vs 3
ggplot(data = nmds_for_ggplot, aes(NMDS1, NMDS3)) +
    geom_point(aes(color = Heat, shape = protegens), alpha=0.4) + 
    geom_path(data=ellipse13_df, aes(x=NMDS1, y=NMDS3, colour=Heat, linetype=protegens), linewidth=1) + 
    guides(color = guide_legend(override.aes = list(linetype=rep(NA,5)))) + 
    scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
    fave_theme + 
    labs(title="exlude: 48h without protegens")

# axis 2 vs 3
ggplot(data = nmds_for_ggplot, aes(NMDS2, NMDS3)) +
    geom_point(aes(color = Heat, shape = protegens), alpha=0.4) + 
    geom_path(data=ellipse23_df, aes(x=NMDS2, y=NMDS3, colour=Heat, linetype=protegens), linewidth=1) + 
    guides(color = guide_legend(override.aes = list(linetype=rep(NA,5)))) + 
    scale_colour_viridis_d(option = "plasma", begin=0.05, end = 0.9) +
    fave_theme + 
    labs(title="exlude: 48h without protegens")

# clean up
rm(abundance_matrix, scree_out, try.NMDS, nmds_for_ggplot, ellipse12_df, ellipse13_df, ellipse23_df, nmdsdata_test_Heat, nmdsdata_test_Prot, nmdsdata_test_HeatxProt, dist_mat, dispersion, nmdsdata_test2_HeatxProt, absDen_forOrd, absDen_wide_forOrd, g, plot1_2, absDen_wide_forOrd_no48, abundance_mat_no48, NMDS_no48)

Richness

Let’s summarize the main result that P. protegens dominates all communities where it was inoculated. The species richness is conceptually a good metric for this … but recall that the flow cytometry data has a some rate of misclassification (in some cases as much as 20% !!!). So we need to use richness estimates that take into account the proportion of species and are more likely to ignore rare species.

Plot

species_div.df <- absDen_forFit %>% mutate(relden_putida = Conc_putida/Total_density,
                                 relden_protegens = Conc_protegens/Total_density,
                                 relden_grimontii = Conc_grimontii/Total_density,
                                 relden_veronii = Conc_veronii/Total_density)
species_div.df <- species_div.df %>% mutate(HillEven_q0 = unlist(calcDiv(sampleData = species_div.df[,c("relden_putida","relden_protegens","relden_grimontii","relden_veronii")],
        type = "HillEven",
        q=0)),
                              HillEven_q1 = unlist(calcDiv(sampleData = species_div.df[,c("relden_putida","relden_protegens","relden_grimontii","relden_veronii")],
        type = "HillEven",
        q=1)),
        HillEven_q2 = unlist(calcDiv(sampleData = species_div.df[,c("relden_putida","relden_protegens","relden_grimontii","relden_veronii")],
        type = "HillEven",
        q=2)),
        HillDiv_q1 = unlist(calcDiv(sampleData = species_div.df[,c("relden_putida","relden_protegens","relden_grimontii","relden_veronii")],
        type = "HillDiv",
        q=1)),
        HillDiv_q2 = unlist(calcDiv(sampleData = species_div.df[,c("relden_putida","relden_protegens","relden_grimontii","relden_veronii")],
        type = "HillDiv",
        q=2)))

ggplot(species_div.df,
       aes(y=HillEven_q0, x=Day, colour=as.factor(CommRich))) +
  facet_grid(protegens~as.factor(Heat)) +
  geom_point(alpha=0.5) +
  scale_colour_viridis_d(option = "viridis", begin=0.1)
## Warning: Removed 110 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(species_div.df,
       aes(y=HillEven_q1, x=Day, colour=as.factor(CommRich))) +
  facet_grid(protegens~as.factor(Heat)) +
  geom_point(alpha=0.5) +
  scale_colour_viridis_d(option = "viridis", begin=0.1)
## Warning: Removed 110 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(species_div.df,
       aes(y=HillEven_q2, x=Day, colour=as.factor(CommRich))) +
  facet_grid(protegens~as.factor(Heat)) +
  geom_point(alpha=0.5) +
  scale_colour_viridis_d(option = "viridis", begin=0.1)
## Warning: Removed 110 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(species_div.df,
       aes(y=HillDiv_q1, x=Day, colour=as.factor(CommRich))) +
  facet_grid(protegens~as.factor(Heat)) +
  geom_point(alpha=0.5) +
  scale_colour_viridis_d(option = "viridis")
## Warning: Removed 56 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(species_div.df,
       aes(y=HillDiv_q2, x=Day, colour=as.factor(CommRich))) +
  facet_grid(protegens~as.factor(Heat)) +
  geom_point(alpha=0.5) +
  scale_colour_viridis_d(option = "viridis")
## Warning: Removed 56 rows containing missing values or values outside the scale range
## (`geom_point()`).

# this one gives infinite values. That's not useful.


################################
# Plot figure for main text: Figure 3b
################################

# The Hill Diversity with q=1 seems useful!! Let's include this plot in the final manuscript:
# re-order  the levels of heat so that the control appears first
species_div.df$Heat <- factor(species_div.df$Heat,
                              levels = levels(species_div.df$Heat)[c(5,1:4)])
# change variable names for nicer plotting
levels(species_div.df$Heat)[2:5] <- paste(levels(species_div.df$Heat)[2:5], "hrs")
species_div.df$Pprot_facet <- ifelse(species_div.df$protegens == 0, "P. protegens absent", "P. protegens present")

# create a data.frame for plotting red rectangles in the background
bckgrd <- data.frame(Heat=levels(species_div.df$Heat),
                     HillDiv_q1 = c(0, rep(2.4, 4)),
                     Day = c(0, rep(0.8, 4))) # all heat treatments start at the same time
test <- rbind(bckgrd,
              data.frame(Heat=levels(species_div.df$Heat),
                         HillDiv_q1 = c(0, rep(2.4, 4)),
                         Day = c(0, 1.1, 1.5, 1.9, 2.9))) # choose end points that look good even if not perfectly accurate

png(filename = "./figures/Fig3_B.png", width = 6.98, height = 4.52, units = "in", res = 300)

ggplot(species_div.df %>% filter(CommRich > 1),
        # exclude monocultures bc richness is not informative for these: their richness will always be equal to 1 even when they go extinct. 
       aes(y=HillDiv_q1, x=Day)) +
  facet_grid(Pprot_facet ~ as.factor(Heat),
             scales = "free_x", # allow x-axis of facets to freely choose their own max values
             space = "free_x") + # allow facet columns to differ in their sizes
  # add red rectangles in the background to indicate heat treatment as in Fig 1.
  geom_ribbon(data=test, aes(ymin=0.95, ymax=2.4), # add a bit of padding above & below the points to look nice
              position = "identity", # not sure this is needed now that I've switched away from geom_area? But it works so I don't care
              fill="#C43131", alpha=0.25) + # use same colour and alpha as in fill for Fig. 1
  # use beeswarm to jitter the points properly (alpha must be set to 0 bc of red rectangles)
  geom_quasirandom(aes(fill=as.factor(CommRich)), # fill as a function of CommRich *must* be inside of this function otherwise it leads to a lot of problems with the geom_ribbon layer
                   pch=21) + # use points with fill and border bc they look nicer here
  scale_fill_viridis_d(option = "viridis", begin=0.85, end=0) +
  scale_x_continuous(breaks=1:5, # tick marks & tick labels only at integers
                     limits=c(0.5,NA), # add a little extra padding on the left side of x-axis bc I think it looks better
                     expand = c(0, 0.2)) + # for some reason this prevents points in the 6h facet from getting squished up against the right border of the facet
  scale_y_continuous(expand = c(0, 0)) + # this prevents ggplot from adding any extra padding in the y-axis; we already added the defauly +/- 0.05 padding manually when we specified geom_ribbon
  labs(y = "Observed Richness", # 1st order Hill Diversity
       fill = "Inoculated\nRichness") +
  guides(fill = guide_legend(override.aes = list(size=3), # make the points bigger in the legend
                             title.hjust = 0.5)) + # justify the 2nd line of legend title & points so they sit nicely under the 1st line
  theme(strip.background = element_rect(fill = "white", colour="black"),
        strip.text = element_text(color = "black"),
        legend.box.spacing = margin(0.5), # stretch plot rightwards & closer to its legend
        legend.title = element_text(size=14)) # make legend title a little smaller
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
dev.off()
## png 
##   2
# clean up
rm(species_div.df, bckgrd, test)

Notice that for the control condition in the absence of protegens, there is a trend of decreasing species richness over time (e.g., as the communities reach equilibrium).

( ## Analysis?? )

I have effect sizes on richness for resistance and recovery but I’m not sure if this is going to make it into the manuscript… (see the analyze_temp_serial_transfer_expt--28Oct24.html if you’re interested).

Extinction

Which communities were most likely to go extinct? How long did the heat duration have to be in order to drive those communities to extinction?

The simplest hypothesis is that heat duration alone explains whether a community happens to go extinct.

A slightly more complex hypothesis from the thermal performance curve data (Fig. 2) would be that any species that is resistant to heat should be less likely to go extinct, even long duration heat. Therefore we would expect that the presence/absence of the heat resistant species, P. putida, should explain whether a community goes extinct.

Maddy’s hypothesis in setting up this experiment was that a higher inoculated species richness would make a community more resistant to heat. So we are going to check whether the inoculated species richness has any effect.

Another hypothesis that emerges from looking at the time series data itself (e.g., the ordination data) is that protegens has a unique effect on all communities where it is present. So let’s check that model as well.

Analysis

# keep just the data on the last day of each time series
extinct.df <- absDensity %>% filter(Recov_Day == 2)
extinct.df <- rbind(extinct.df,
                    absDensity %>% filter(Heat == 0, Day == 5))
# binary vector of survival or extinction
extinct.df <- extinct.df %>% mutate(survived = ifelse(Total_density > 0, 1, 0))
### note that sample "24-07-08 Epoch G1" has missing data on Day 5 even though we know from the OD data that it survived.
extinct.df$survived[extinct.df$uniqID == "24-07-08 Epoch G1"] <- 1

# make protegens into a factor
extinct.df$protegens <- factor(extinct.df$protegens)

# fit the models
ext_mod.heat <- glmmTMB(survived ~ Heat,
                          data = extinct.df,
                          family = binomial,
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
#simulateResiduals(fittedModel = ext_mod.heat, plot = TRUE)

ext_mod.heat_plus_rich <- glmmTMB(survived ~ CommRich + Heat,
                          data = extinct.df,
                          family = binomial,
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
#simulateResiduals(fittedModel = ext_mod.heat_plus_rich, plot = TRUE)

ext_mod.heat_plus_prot <- glmmTMB(survived ~ Heat + protegens,
                          data = extinct.df,
                          family = binomial,
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
simulateResiduals(fittedModel = ext_mod.heat_plus_prot, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.9645332 0.959227 0.02965619 0.2301006 0.1030454 0.9776571 0.827906 0.3086776 0.4034651 0.7309843 0.8247881 0.9087348 0.7163956 0.5443066 0.7949462 0.4422744 0.08401695 0.5889396 0.5640054 0.7787241 ...
ext_mod.heat_plus_resist <- glmmTMB(survived ~ Heat + resistant,
                          data = extinct.df,
                          family = binomial,
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
#simulateResiduals(fittedModel = ext_mod.heat_plus_resist, plot = TRUE)


ext_mod.heat_by_rich <- glmmTMB(survived ~ CommRich*Heat,
                          data = extinct.df,
                          family = binomial,
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
#simulateResiduals(fittedModel = ext_mod.heat_by_rich, plot = TRUE)

ext_mod.heat_by_prot <- glmmTMB(survived ~ Heat*protegens,
                          data = extinct.df,
                          family = binomial,
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; singular convergence (7). See vignette('troubleshooting'),
## help('diagnose')
#simulateResiduals(fittedModel = ext_mod.heat_by_prot, plot = TRUE)


ext_mod.heat_rich_prot <- glmmTMB(survived ~ CommRich + Heat + protegens,
                          data = extinct.df,
                          family = binomial,
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
#simulateResiduals(fittedModel = ext_mod.heat_rich_prot, plot = TRUE)

ext_mod.rich_heatXprot <- glmmTMB(survived ~ CommRich + Heat*protegens,
                          data = extinct.df,
                          family = binomial,
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
#simulateResiduals(fittedModel = ext_mod.rich_heatXprot, plot = TRUE)

#anova(ext_mod.heat, ext_mod.heat_plus_rich)
#anova(ext_mod.heat, ext_mod.heat_by_rich)
#anova(ext_mod.heat_plus_rich, ext_mod.heat_by_rich)

AIC(ext_mod.heat, ext_mod.heat_plus_rich, ext_mod.heat_plus_prot, ext_mod.heat_by_rich, ext_mod.heat_by_prot, ext_mod.heat_rich_prot, ext_mod.rich_heatXprot, ext_mod.heat_plus_resist) %>% arrange(AIC)
AICc(ext_mod.heat, ext_mod.heat_plus_rich, ext_mod.heat_plus_prot, ext_mod.heat_by_rich, ext_mod.heat_by_prot, ext_mod.heat_rich_prot, ext_mod.rich_heatXprot, ext_mod.heat_plus_resist) %>% arrange(AICc)
BIC(ext_mod.heat, ext_mod.heat_plus_rich, ext_mod.heat_plus_prot, ext_mod.heat_by_rich, ext_mod.heat_by_prot, ext_mod.heat_rich_prot, ext_mod.rich_heatXprot, ext_mod.heat_plus_resist) %>% arrange(BIC)
summary(ext_mod.heat_rich_prot)
##  Family: binomial  ( logit )
## Formula:          survived ~ CommRich + Heat + protegens
## Data: extinct.df
## 
##      AIC      BIC   logLik deviance df.resid 
##     62.1     77.2    -27.0     54.1      320 
## 
## 
## Conditional model:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  5.688e+00  1.512e+00   3.763 0.000168 ***
## CommRich     5.080e-01  5.035e-01   1.009 0.312982    
## Heat        -1.445e-01  3.096e-02  -4.666 3.07e-06 ***
## protegens1   2.305e+01  1.592e+04   0.001 0.998844    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(ext_mod.heat_plus_prot)
##  Family: binomial  ( logit )
## Formula:          survived ~ Heat + protegens
## Data: extinct.df
## 
##      AIC      BIC   logLik deviance df.resid 
##     61.1     72.5    -27.6     55.1      321 
## 
## 
## Conditional model:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  6.459e+00  1.340e+00   4.820 1.43e-06 ***
## Heat        -1.425e-01  3.038e-02  -4.690 2.73e-06 ***
## protegens1   2.363e+01  1.849e+04   0.001    0.999    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(ext_mod.heat, ext_mod.heat_plus_prot)
anova(ext_mod.heat_plus_prot, ext_mod.heat_rich_prot)
# and let's report the R-squared for this 
efronRSquared(residual = residuals(ext_mod.heat_plus_prot, type="response"), 
              predicted = predict(ext_mod.heat_plus_prot, type="response"), 
              statistic = "EfronRSquared")
## EfronRSquared 
##         0.501

This tells us that the most important predictors are: 1. the duration of the heat event and 2. the presence/absence of protegens in the inoculated community. This model explains about 50% of the variation in the data. We have little power to detect an effect of inoculated community richness on the extinction.

A final possibility is that the growth rates of the different communities can explain whether they go extinct. Let’s check if the average growth rate of the community at 30C can predict its extinction…

ext_mod.expect_mu <- glmmTMB(survived ~ Heat + community_expected_mu,
                          data = extinct.df,
                          family = binomial,
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
#simulateResiduals(fittedModel = ext_mod.expect_mu, plot = TRUE)

ext_mod.exptmu_prot <- glmmTMB(survived ~ Heat + community_expected_mu + protegens,
                          data = extinct.df,
                          family = binomial,
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
simulateResiduals(fittedModel = ext_mod.exptmu_prot, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.9628912 0.9384698 0.02965619 0.4072284 0.1428897 0.9776571 0.6471349 0.3086776 0.6884299 0.7309843 0.8014265 0.9087348 0.7163956 0.5443066 0.7949462 0.5218201 0.08401695 0.5889396 0.05015453 0.7787241 ...
ext_mod.exptmuXprot <- glmmTMB(survived ~ Heat + community_expected_mu*protegens,
                          data = extinct.df,
                          family = binomial,
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
simulateResiduals(fittedModel = ext_mod.exptmu_prot, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.9628912 0.9384698 0.02965619 0.4072284 0.1428897 0.9776571 0.6471349 0.3086776 0.6884299 0.7309843 0.8014265 0.9087348 0.7163956 0.5443066 0.7949462 0.5218201 0.08401695 0.5889396 0.05015453 0.7787241 ...
ext_mod.heatXexptmu_prot <- glmmTMB(survived ~ Heat*community_expected_mu + protegens,
                          data = extinct.df,
                          family = binomial,
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; singular convergence (7). See vignette('troubleshooting'),
## help('diagnose')
simulateResiduals(fittedModel = ext_mod.exptmu_prot, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.9628912 0.9384698 0.02965619 0.4072284 0.1428897 0.9776571 0.6471349 0.3086776 0.6884299 0.7309843 0.8014265 0.9087348 0.7163956 0.5443066 0.7949462 0.5218201 0.08401695 0.5889396 0.05015453 0.7787241 ...
ext_mod.heatXprot_exptmu <- glmmTMB(survived ~ Heat*protegens + community_expected_mu,
                          data = extinct.df,
                          family = binomial,
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; singular convergence (7). See vignette('troubleshooting'),
## help('diagnose')
simulateResiduals(fittedModel = ext_mod.exptmu_prot, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.9628912 0.9384698 0.02965619 0.4072284 0.1428897 0.9776571 0.6471349 0.3086776 0.6884299 0.7309843 0.8014265 0.9087348 0.7163956 0.5443066 0.7949462 0.5218201 0.08401695 0.5889396 0.05015453 0.7787241 ...
ext_mod.heatXprotXexptmu <- glmmTMB(survived ~ Heat*protegens*community_expected_mu,
                          data = extinct.df,
                          family = binomial,
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; singular convergence (7). See vignette('troubleshooting'),
## help('diagnose')
simulateResiduals(fittedModel = ext_mod.exptmu_prot, plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.9628912 0.9384698 0.02965619 0.4072284 0.1428897 0.9776571 0.6471349 0.3086776 0.6884299 0.7309843 0.8014265 0.9087348 0.7163956 0.5443066 0.7949462 0.5218201 0.08401695 0.5889396 0.05015453 0.7787241 ...
ext_mod.averaged_mu <- glmmTMB(survived ~ Heat + community_averaged_mu,
                          data = extinct.df,
                          family = binomial,
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
#simulateResiduals(fittedModel = ext_mod.averaged_mu, plot = TRUE)

ext_mod.avemu_prot <- glmmTMB(survived ~ Heat + community_averaged_mu + protegens,
                          data = extinct.df,
                          family = binomial,
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
simulateResiduals(fittedModel = ext_mod.avemu_prot, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.4089731 0.6516057 0.5394315 0.4501793 0.9890527 0.5333557 0.2309646 0.3948581 0.6868779 0.02586429 0.5943815 0.08103522 0.8765983 0.7344452 0.2819697 0.6605734 0.1427851 0.5192184 0.291833 0.4861997 ...
ext_mod.exptmu_prot_resist <- glmmTMB(survived ~ Heat + community_expected_mu + protegens + resistant,
                          data = extinct.df,
                          family = binomial,
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; singular convergence (7). See vignette('troubleshooting'),
## help('diagnose')
#simulateResiduals(fittedModel = ext_mod.exptmu_prot_resist, plot = TRUE)

summary(ext_mod.exptmu_prot)
##  Family: binomial  ( logit )
## Formula:          survived ~ Heat + community_expected_mu + protegens
## Data: extinct.df
## 
##      AIC      BIC   logLik deviance df.resid 
##     42.1     57.2    -17.0     34.1      320 
## 
## 
## Conditional model:
##                         Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           -3.080e+00  3.035e+00  -1.015  0.31019    
## Heat                  -2.152e-01  5.309e-02  -4.054 5.03e-05 ***
## community_expected_mu  1.463e+01  5.024e+00   2.912  0.00359 ** 
## protegens1             2.433e+01  2.154e+04   0.001  0.99910    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(ext_mod.avemu_prot)
##  Family: binomial  ( logit )
## Formula:          survived ~ Heat + community_averaged_mu + protegens
## Data: extinct.df
## 
##      AIC      BIC   logLik deviance df.resid 
##     35.8     51.0    -13.9     27.8      320 
## 
## 
## Conditional model:
##                         Estimate Std. Error z value Pr(>|z|)  
## (Intercept)              -3.2158     2.9907  -1.075   0.2823  
## Heat                     -0.5403     0.2515  -2.149   0.0317 *
## community_averaged_mu    26.8226    12.9368   2.073   0.0381 *
## protegens1               27.7150 24864.3060   0.001   0.9991  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# this model does not converge bc of the experimental design (some predictor combinations are unitary)
summary(ext_mod.exptmu_prot_resist)
##  Family: binomial  ( logit )
## Formula:          
## survived ~ Heat + community_expected_mu + protegens + resistant
## Data: extinct.df
## 
##      AIC      BIC   logLik deviance df.resid 
##       NA       NA       NA       NA      319 
## 
## 
## Conditional model:
##                       Estimate Std. Error z value Pr(>|z|)
## (Intercept)             36.833        NaN     NaN      NaN
## Heat                    -1.711        NaN     NaN      NaN
## community_expected_mu    8.399        NaN     NaN      NaN
## protegens1              79.339        NaN     NaN      NaN
## resistant               37.915        NaN     NaN      NaN
## this should go in a single table instead of 3 diferent ones:
AIC(ext_mod.heat, ext_mod.heat_plus_prot, ext_mod.rich_heatXprot, ext_mod.heat_plus_resist, ext_mod.expect_mu, ext_mod.averaged_mu, ext_mod.exptmu_prot, ext_mod.avemu_prot, ext_mod.exptmu_prot_resist, ext_mod.exptmuXprot, ext_mod.heatXexptmu_prot, ext_mod.heatXprot_exptmu, ext_mod.heatXprotXexptmu) %>% arrange(AIC)
AICc(ext_mod.heat, ext_mod.heat_plus_prot, ext_mod.rich_heatXprot, ext_mod.heat_plus_resist, ext_mod.expect_mu, ext_mod.averaged_mu, ext_mod.exptmu_prot, ext_mod.avemu_prot, ext_mod.exptmu_prot_resist, ext_mod.exptmuXprot, ext_mod.heatXexptmu_prot, ext_mod.heatXprot_exptmu, ext_mod.heatXprotXexptmu) %>% arrange(AICc)
BIC(ext_mod.heat, ext_mod.heat_plus_prot, ext_mod.rich_heatXprot, ext_mod.heat_plus_resist, ext_mod.expect_mu, ext_mod.averaged_mu, ext_mod.exptmu_prot, ext_mod.avemu_prot, ext_mod.exptmu_prot_resist, ext_mod.exptmuXprot, ext_mod.heatXexptmu_prot, ext_mod.heatXprot_exptmu, ext_mod.heatXprotXexptmu) %>% arrange(BIC)
# wow, I'm shocked that this growth rate model is actually better. Let's double check that...
anova(ext_mod.heat_plus_prot, ext_mod.exptmu_prot)
anova(ext_mod.heat_plus_prot, ext_mod.avemu_prot)
# let's check if the expected_mu model with interactions is able to get a better estimate on the effect of protegens:
summary(ext_mod.exptmuXprot) # no, it's the same
##  Family: binomial  ( logit )
## Formula:          survived ~ Heat + community_expected_mu * protegens
## Data: extinct.df
## 
##      AIC      BIC   logLik deviance df.resid 
##     44.1     63.0    -17.0     34.1      319 
## 
## 
## Conditional model:
##                                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                      -3.080e+00  3.035e+00  -1.015  0.31019    
## Heat                             -2.152e-01  5.309e-02  -4.054 5.03e-05 ***
## community_expected_mu             1.463e+01  5.024e+00   2.912  0.00359 ** 
## protegens1                        1.917e+01  4.099e+05   0.000  0.99996    
## community_expected_mu:protegens1  6.392e+00  5.051e+05   0.000  0.99999    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# get the 95% confidence intervals:
print("expected mu:")
## [1] "expected mu:"
confint(ext_mod.exptmu_prot)
##                               2.5 %        97.5 %   Estimate
## (Intercept)               -9.028488     2.8685403 -3.0799739
## Heat                      -0.319264    -0.1111684 -0.2152162
## community_expected_mu      4.784301    24.4766683 14.6304845
## protegens1            -42192.404379 42241.0685341 24.3320775
print("realized average mu:")
## [1] "realized average mu:"
confint(ext_mod.avemu_prot)
##                               2.5 %        97.5 %   Estimate
## (Intercept)               -9.077401  2.645851e+00 -3.2157748
## Heat                      -1.033214 -4.744673e-02 -0.5403302
## community_averaged_mu      1.466930  5.217824e+01 26.8225864
## protegens1            -48705.429291  4.876086e+04 27.7149920
# and let's report the R-squared
efronRSquared(residual = residuals(ext_mod.exptmu_prot, type="response"), 
              predicted = predict(ext_mod.exptmu_prot, type="response"), 
              statistic = "EfronRSquared")
## EfronRSquared 
##         0.707
efronRSquared(residual = residuals(ext_mod.avemu_prot, type="response"), 
              predicted = predict(ext_mod.avemu_prot, type="response"), 
              statistic = "EfronRSquared")
## EfronRSquared 
##         0.747

Ok, so neither the community_expected_mu nor the community_averaged_mu are as good predictors as just heat duration and presence/absence of protegens. But when we add the presence/absence of protegens to either the community_expected_mu or community_averaged_mu, we get very good models that are significantly better than the one reported above. e.g., The model with just heat duration and protegens presence/absence explains 50% of the variation, while models explain 70-75% of the variation. Another way to report this result is to give the \(\Delta\)AIC or \(\Delta\)BIC.

Most of the models with community_expected_mu and interactions failed to converge. The one that succeeded is about the same as the simpler, additive model (\(\Delta\)AIC = 2) but it’s definitely less preferred by BIC than the simpler additive model (\(\Delta\)BIC = 5.78). In addition, there is absolutely no improvement in the estimate for protegens in this interactions model (and the interaction itself cannot be reliably estimated either).

Although the community_averaged_mu is a better fit, I prefer to use the community_expected_mu because it fits better in the framework of “can we predict microbial community dynamics from the species traits?” Moreover, the fit (at least for the extinction data) is not that much worse for the community_expected_mu as compared to the community_averaged_mu – and both are insufficient to explain the data without the effect of protegens anyway!

absDensity <- absDensity %>% select(-community_averaged_mu)
absDen_forFit <- absDen_forFit %>% select(-community_averaged_mu)
extinct.df <- extinct.df %>% select(-community_averaged_mu)

Odds ratios

For easier interpretation of the logistic regression effect sizes, let’s get the odds ratios of extinction. (Note that above we were getting the odds ratios of survival).

best_model <- glmmTMB((1-survived) ~ Heat + community_expected_mu + protegens,
                          data = extinct.df,
                          family = binomial,
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))

# check that this result is consistent with the model we had before
summary(best_model) # it is indeed consistent
##  Family: binomial  ( logit )
## Formula:          (1 - survived) ~ Heat + community_expected_mu + protegens
## Data: extinct.df
## 
##      AIC      BIC   logLik deviance df.resid 
##     42.1     57.2    -17.0     34.1      320 
## 
## 
## Conditional model:
##                         Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            3.080e+00  3.035e+00   1.015  0.31019    
## Heat                   2.152e-01  5.309e-02   4.054 5.03e-05 ***
## community_expected_mu -1.463e+01  5.024e+00  -2.912  0.00359 ** 
## protegens1            -2.433e+01  2.154e+04  -0.001  0.99910    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# print the odds ratios
exp(confint(best_model))
##                              2.5 %       97.5 %     Estimate
## (Intercept)           5.678175e-02 8.337244e+03 2.175783e+01
## Heat                  1.117583e+00 1.376115e+00 1.240130e+00
## community_expected_mu 2.343786e-11 8.359968e-03 4.426509e-07
## protegens1            0.000000e+00          Inf 2.708401e-11
# get the number of replicates that either did or did not have protegens for each heat treatment:
extinct.df %>% mutate(died = 1-survived) %>%
  group_by(protegens, Heat) %>%
    summarise(total_reps = n(),
              survived = sum(survived),
              extinct = sum(died)) %>%
      as.data.frame()
## `summarise()` has grouped output by 'protegens'. You can override using the
## `.groups` argument.
# do the same but for putida
extinct.df %>% mutate(died = 1-survived) %>%
  group_by(putida, Heat) %>%
    summarise(total_reps = n(),
              survived = sum(survived),
              extinct = sum(died)) %>%
      as.data.frame()
## `summarise()` has grouped output by 'putida'. You can override using the
## `.groups` argument.
rm(best_model)

The effect of protegens is large but unreliable. I also report the total number of replicates with protegens at 24h and 48h and with putida at 24h and 48h just to give an idea.

Plot

Plot the preferred model against the data: growth on final day ~ heat duration + protegens present + community_expected_mu

# create data.frame for plotting
extinct.df <- cbind(extinct.df,
                    predicted = predict(ext_mod.exptmu_prot, type="response"))
# plot the predictions against the data
plot(ggplot(extinct.df,
        aes(x=as.factor(Heat),
            y=survived,
            colour=community_expected_mu, 
            group=as.factor(community_expected_mu))) +
    facet_wrap(. ~ protegens,
               labeller = as_labeller(c(`0`="P. protegens absent",
                                        `1`="P. protegens present"))) +
    geom_hline(yintercept = 0, colour="grey") +
    geom_line(aes(y = predicted)) +
    geom_jitter(alpha=0.6, width=0.1, height = 0.25) +
      # would be nice to use beeswarm package but not sure how to do that as y is factorial here but numeric for the model predictions
    scale_y_continuous(breaks = c(0, 1)) +
    scale_colour_viridis_c(option = "inferno", end=0.85) +
    labs(x="Heat duration (hrs)",
         y="Growth in well on last day?", colour="Community\nExpected\nGrowth Rate"))

# plot the effect sizes of the preferred model
extinct_forplot <- data.frame(confint(ext_mod.exptmu_prot))
colnames(extinct_forplot)[1:2] <- c("loCI", "hiCI")
extinct_forplot$predictor <- as.factor(rownames(extinct_forplot))
# protegens effect size is not significant and has giant CI that obscure other estimates
ggplot(extinct_forplot,
       aes(x = Estimate, y = predictor)) +
  geom_vline(xintercept = 0, colour="grey") +
  geom_point() +
  geom_errorbarh(aes(xmin = loCI, xmax = hiCI), height=0)

# plot the effect sizes again without protegens
ggplot(extinct_forplot %>% filter(predictor != "protegens1"),
       aes(x = Estimate, y = predictor)) +
  geom_vline(xintercept = 0, colour="grey") +
  geom_point() +
  geom_errorbarh(aes(xmin = loCI, xmax = hiCI), height=0)

# clean up
rm(ext_mod.heat, ext_mod.heat_plus_rich, ext_mod.heat_plus_resist, ext_mod.heat_plus_prot, ext_mod.heat_by_rich, ext_mod.heat_by_prot, ext_mod.heat_rich_prot, ext_mod.rich_heatXprot, ext_mod.expect_mu, ext_mod.averaged_mu, ext_mod.exptmu_prot, ext_mod.avemu_prot, extinct_forplot, ext_mod.exptmu_prot_resist, ext_mod.exptmuXprot, ext_mod.heatXexptmu_prot, ext_mod.heatXprot_exptmu, ext_mod.heatXprotXexptmu)

Shannon diversity

How is community diversity impacted during and after heat? Here we will have to be mindful to control for inoculated community richness as a nuisance variable (i.e., because we will always expect to see (less) more diversity in communities that were inoculated with more (less) species. But this is just part of our experimental design; we’re not interested in this effect per se).

Let’s first plot the Shannon diversity directly to get an idea of what we’re dealing with:

ggplot(absDen_forFit %>% filter(CommRich > 1), # monocultures are meaningless for diversity
       aes(y=Diversity, x=Day, fill=as.factor(CommRich))) +
  facet_grid(protegens~as.factor(Heat)) +
  geom_quasirandom(alpha=0.7, pch=21) +
  scale_fill_viridis_d(option = "viridis", begin=0.85, end=0) +
  labs(y = "Shannon Diversity",
       fill = "Inoculated\nRichness")
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`position_quasirandom()`).

ggplot(absDen_forFit %>% filter(CommRich > 1), # monocultures are meaningless for diversity
       aes(y=Diversity, x=Day, fill=community_expected_mu)) +
  facet_grid(protegens~as.factor(Heat)) +
  geom_quasirandom(alpha=0.7, pch=21) +
  scale_fill_viridis_c(option = "inferno", end=0.85) +
  labs(y = "Shannon Diversity",
       fill = "Expected\nCommunity mu")
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`position_quasirandom()`).

Choose GLM Family

Maddy and Gerard suggest that I use the full model to estimate effect size then emmeans to estimate the effect size post-hoc. I’m tailouring this analysis on the example script that Nico sent me (Script_simplified for Hermina.R).

One unique attribute of my experimental design is that the Day used as control differs with heat duration (e.g., last day of recovery for 6h heat duration is Day 3 but last day of recovery for 48h is Day 5). Our solution for this is to run separate models for each heat treatment with its respective controls (i.e., 4 models in total). To make sure that the effect sizes will be directly comparable across the models (especially with respect to the standard deviation), Gerard suggested that I scale the whole data prior to splitting it up into 4 (but not centering it as that will give me negative values that I can’t really use a ). Finally, if/when testing for significance it will then be necessary to control for multiple testing (e.g., using a Bonferroni correction).

Note that for diversity I am considering CommRich as a numeric (which assumes a linear effect of community richness, e.g., where 4 species is 2 * the effect of 2 species). Initially I tried playing around with CommRich as an ordered (& unordered) factor. But I found that glmmTMB was choosing to drop different predictors because it was upset that my model was overparameterized. This was particularly annoying as it was dropping the estimates for the control treatments…

The first thing we need to do is to pick which GLM family of distributions looks best for our data:

# remove the monocultures from the data
diversity_forFit <- absDen_forFit %>% filter(CommRich > 1) %>% # diversity is nonsense for monocultures
                        select(-Total_density, -Conc_putida, -Conc_protegens, -Conc_grimontii, -Conc_veronii)

# scale the data by its standard deviation
diversity_forFit$Diversity_scale <- scale(diversity_forFit$Diversity,
                                          scale = sd(diversity_forFit$Diversity, na.rm =  TRUE),
                                          center = FALSE)

# the max re-scaled value is 5.38 and 38% of the data is 0's
# so try gamma and lognormal distributions (maybe also Gaussian just to check that it's a bad fit?)
summary(diversity_forFit$Diversity_scale)
##        V1         
##  Min.   :0.00000  
##  1st Qu.:0.00000  
##  Median :0.01955  
##  Mean   :0.45128  
##  3rd Qu.:0.19557  
##  Max.   :5.37608  
##  NA's   :12
sum(diversity_forFit$Diversity_scale == 0, na.rm = TRUE) / length(diversity_forFit$Diversity_scale)
## [1] 0.3850868
# let's compare different GLM families
try_gaussian <- glmmTMB(Diversity_scale ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
                        data = diversity_forFit,
                        control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_gaussian, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 1 0.196 0.26 0.992 1 0.2 0.208 0.944 0.208 0.248 1 0.18 0.26 0.224 1 0.212 0.212 0.244 1 0.256 ...
print("gamma family with zero-inflated model:")
## [1] "gamma family with zero-inflated model:"
try_gamma0 <- glmmTMB(Diversity_scale ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
                     data = diversity_forFit,
                     family = ziGamma,
                     ziformula = ~1, # this needs to be added because there are 0 values in the data
                     control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
simulateResiduals(fittedModel = try_gamma0, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.952 0.42 0.3568358 0.944 0.94 0.3401457 0.3068186 0.88 0.58 0.1703229 0.988 0.456 0.612 0.512 0.964 0.1434111 0.532 0.428 0.984 0.424 ...
print("lognormal family with zero-inflated model:")
## [1] "lognormal family with zero-inflated model:"
try_lognorm0 <- glmmTMB(Diversity_scale ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
                       data = diversity_forFit,
                       family = lognormal,
                       ziformula = ~1, # this needs to be added because there are 0 values in the data
                       control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_lognorm0, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.944 0.392 0.160635 0.952 0.948 0.1410465 0.2099037 0.948 0.488 0.3268879 0.968 0.476 0.628 0.5 0.972 0.09914734 0.556 0.4 0.988 0.452 ...
print("log(x+1) transformed data, lognormal family with zero-inflated model:")
## [1] "log(x+1) transformed data, lognormal family with zero-inflated model:"
try_LOGlognorm0 <- glmmTMB(log(Diversity_scale+1) ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
                          data = diversity_forFit,
                          family = lognormal,
                          ziformula = ~1, # I'm keeping this as 0-inflated lognormal alone was already over-dispersed. So I want to see if the log(x+1) transformation sufficiently brings in the long tail.
                          control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_LOGlognorm0, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.932 0.38 0.160635 0.94 0.92 0.1410465 0.2099037 0.94 0.488 0.3268879 0.944 0.476 0.644 0.508 0.956 0.09914734 0.568 0.4 0.976 0.448 ...
try_negbinom <- glmmTMB(as.integer(Diversity_scale*1000) ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
                       data = diversity_forFit,
                       family = nbinom2,
                       control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_negbinom, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.9 0.484 0.02934308 0.888 0.932 0.001846165 0.1464347 0.844 0.524 0.04893932 0.928 0.545532 0.592 0.498831 0.964 0.08390474 0.62 0.436 0.972 0.496 ...
try_negbinom0 <- glmmTMB(as.integer(Diversity_scale*1000) ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
                         data = diversity_forFit,
                         family = nbinom2,
                         ziformula = ~1, # try zero inflated distribution
                         control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_negbinom0, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.968 0.4760399 0.09975181 0.944 0.96 0.1870824 0.1012531 0.856 0.524 0.2883702 0.972 0.488 0.5489878 0.472 0.972 0.2933996 0.576 0.4214444 0.968 0.48 ...
try_poisson <- glmmTMB(as.integer(Diversity_scale*1000) ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
                         data = diversity_forFit,
                         family = genpois,
                         control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_poisson, plot = TRUE)
## DHARMa:testOutliers with type = binomial may have inflated Type I error rates for integer-valued distributions. To get a more exact result, it is recommended to re-run testOutliers with type = 'bootstrap'. See ?testOutliers for details

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.992 0.4616676 0.04975974 0.968 0.98 0.001601337 0.06714782 0.96 0.5715112 0.02849031 0.98 0.5892411 0.78 0.728 0.984 0.0152989 0.78 0.332584 0.984 0.4979502 ...
try_poisson0 <- glmmTMB(as.integer(Diversity_scale*1000) ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
                         data = diversity_forFit,
                         family = genpois,
                         ziformula = ~1, # try zero inflated distribution
                         control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_poisson0, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.956 0.3819737 0.211971 0.968 0.964 0.01318631 0.2743196 0.924 0.3770803 0.244544 0.972 0.388 0.584 0.4491087 0.96 0.05561836 0.5923928 0.3206311 0.972 0.356 ...
################################################

# I would generally prefer to use a zero inflated distribution but those are annoying to calculate effect sizes for (by posthoc emmeans). So let's do the classic hack and 
  # find the smallest non-zero value in the rescaled diversity estimate
smallest_diversity <- min(diversity_forFit$Diversity_scale[diversity_forFit$Diversity_scale != 0], na.rm=TRUE)
# now add 1/100th of that value to all the diversity estimates and re-do the fit for the family that looked best above
diversity_forFit$Diversity_scalePLUSepsilon <- diversity_forFit$Diversity_scale + smallest_diversity/100

print("data transformed (D + epsilon), lognormal family:")
## [1] "data transformed (D + epsilon), lognormal family:"
# this looks just awful:
try_lognorm_plusE <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
                             data = diversity_forFit,
                             family = lognormal,
                             control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_lognorm_plusE, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.912 0.444 0.072 0.932 0.908 0.04 0.044 0.868 0.48 0.028 0.904 0.472 0.608 0.576 0.9 0.056 0.576 0.34 0.88 0.444 ...
#Let's see if we can make things better in any way by using a log transformation? (this model looked the best with zero-inflation above)
try_lognorm_logPLUSe <- glmmTMB(log(Diversity_scale+1+smallest_diversity) ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
                          data = diversity_forFit,
                          family = lognormal,
                          control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_lognorm_logPLUSe, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.96 0.196 0.112 0.972 0.944 0.072 0.088 0.936 0.288 0.052 0.968 0.32 0.544 0.436 0.956 0.084 0.524 0.204 0.956 0.256 ...
# let's just try other families...
try_gaussian_plusE <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
                        data = diversity_forFit,
                        control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_gaussian_plusE, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 1 0.196 0.26 0.992 1 0.2 0.208 0.944 0.208 0.248 1 0.18 0.26 0.224 1 0.212 0.212 0.244 1 0.256 ...
print("data transformed (D + epsilon), gamma family:")
## [1] "data transformed (D + epsilon), gamma family:"
try_gamma_plusE <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
                     data = diversity_forFit,
                     family = Gamma,
                     control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
simulateResiduals(fittedModel = try_gamma_plusE, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.932 0.376 0.156 0.88 0.916 0.104 0.152 0.864 0.42 0.108 0.928 0.38 0.516 0.48 0.884 0.104 0.512 0.364 0.944 0.408 ...
try_negbinom_plusE <- glmmTMB(as.integer(Diversity_scalePLUSepsilon*1000) ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
                       data = diversity_forFit,
                       family = nbinom2,
                       control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_negbinom_plusE, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.888 0.4291341 0.02065348 0.936 0.952 0.05675836 0.05408069 0.876 0.512 0.04733732 0.98 0.464 0.592 0.616 0.956 0.00626431 0.596 0.443546 0.976 0.4538491 ...
try_poisson_plusE <- glmmTMB(as.integer(Diversity_scalePLUSepsilon*1000) ~ CommRich:Day + Heat:Day + CommRich:Heat:Day,
                         data = diversity_forFit,
                         family = genpois,
                         control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_poisson_plusE, plot = TRUE)
## DHARMa:testOutliers with type = binomial may have inflated Type I error rates for integer-valued distributions. To get a more exact result, it is recommended to re-run testOutliers with type = 'bootstrap'. See ?testOutliers for details
## qu = 0.75, log(sigma) = -2.451302 : outer Newton did not converge fully.

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.992 0.4616676 0.04975974 0.968 0.98 0.001601337 0.06714782 0.96 0.5715112 0.02849031 0.98 0.5892411 0.78 0.728 0.984 0.0152989 0.78 0.331438 0.984 0.496 ...
# let's check this with AIC and BIC
AIC(try_gaussian, try_gamma0, try_lognorm0, try_LOGlognorm0,
    try_negbinom, try_negbinom0, try_poisson, try_poisson0, try_lognorm_plusE, try_lognorm_logPLUSe, try_gaussian_plusE, try_gamma_plusE, try_negbinom_plusE, try_poisson_plusE) %>% arrange(AIC)
BIC(try_gaussian, try_gamma0, try_lognorm0, try_LOGlognorm0,
    try_negbinom, try_negbinom0, try_poisson, try_poisson0, try_lognorm_plusE, try_lognorm_logPLUSe, try_gaussian_plusE, try_gamma_plusE, try_negbinom_plusE, try_poisson_plusE) %>% arrange(BIC)
# clean up
rm(try_gaussian, try_gamma0, try_lognorm0, try_LOGlognorm0, try_negbinom, try_negbinom0, try_poisson, try_poisson0, try_lognorm_plusE, try_lognorm_logPLUSe, try_gaussian_plusE, try_gamma_plusE, try_negbinom_plusE, try_poisson_plusE, smallest_diversity)

According to the residuals, the zero-inflated negative binomial and the zero-inflated lognormal are about equally okay-ish. (We could also take the AIC & BIC values in consideration for our decision but that is far less important.) My preference would be to go with the zero-inflated lognormal. The reason for this is because my understanding is that the most important thing to consider when selecting a GLM family is which family would a priori be the most natural choice. For diversity data, the Gamma or lognormal distributions are the most natural choices a priori because, for 4 species, the Shannon diversity data is a continuous variable between 0 and 1.386294. Therefore I think it makes sense to choose the lognormal (even if its residuals are not perfect).

For this data I would generally prefer to use one of the zero inflated distributions but we are interested in the effect sizes (emmeans & posthoc analyses). The problem with the zero inflated distribution is that it leads to very confusing looking effect sizes when I do the downstream analyses. This is because the effect sizes are split up over the conditional and the zero-inflated parts of the model. The overall effect size is: (1 - zi)*(cond mean). See this stackover flow thread. The practical problem is that it just looks really weird for this data.

To get around this issue, I have transformed the re-scaled data to add a small value (min(rescaled diversity)/100) to all diversity estimates such that we remove the zero’s. This way we will no longer need to use a zero-inflated part in the model and the effect sizes will be simpler to explain. (Especially because I think I am losing a lot of power for the 48h treatment as a result of almost everything literally being 0.)

My final decision is to use the \(D + \epsilon\) transformed data and a lognormal family of distributions. It doesn’t have the best residuals but it strikes the right compromise of a priori justifiable as well as useful and interpretable for the analysis we’re interested in.

Finally, note that in the model fitting above I consider Day as a numeric predictor. This is because I want to decide on the GLM family by considering the complete data. (& I was having problems with Day as an un/ordered factor…)

Compare fit of different models to data subsets

For the rest of the analysis below, I consider the effect of day (which is called Trtmt_Day) as a factor representing either resistance (i.e., on the last day of heat) or recovery. To do this I need to subset the data to include only resistance, 1st day post-heat, and last day post-heat for the heat treatment. And I need to keep exactly the same days of the control treatment. I create this data subset for each heat treatment duration (so 4 data subsets in total).

We want to find which model is the best fit for all the data subsets.

Note that it’s not possible to fit any models with protegens present and heat resistant both as predictors!!

# a hacky function to get the number of parameters in the model
npar_of_glmmTMB_fit <- function(modfit)
  length(modfit$fit$parfull)
  
# a function to fit the different models to the subsetted data:
fit_diversity_models <- function(data_subset) {
  # create list for output
  output.ls <- list()
  
  # this is the simplest model. I'm fitting it to check for colinearity
  output.ls[["simple"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat + Trtmt_Day + protegens + community_expected_mu,
                               data = data_subset,
                               family = lognormal,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  # this is another simple model to check for colinearity
  output.ls[["simple resist"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat + Trtmt_Day + community_expected_mu + resistant,
                               data = data_subset,
                               family = lognormal,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  # this is our null model:
  output.ls[["H0"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day,
                               data = data_subset,
                               family = lognormal,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  # this is another null model just to confirm that CommRich has NO interactions with heat
  output.ls[["H0: *CommRich"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich*Heat*Trtmt_Day,
                                          data = data_subset,
                                          family = lognormal,
                                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  output.ls[["+resist"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day + resistant,
                                   data = data_subset,
                                   family = lognormal,
                                   control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))

  output.ls[["*resist"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*resistant,
                                   data = data_subset,
                                   family = lognormal,
                                   control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  output.ls[["+prot"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day + protegens,
                                  data = data_subset,
                                  family = lognormal,
                                  control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  output.ls[["*prot"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*protegens,
                                  data = data_subset,
                                  family = lognormal,
                                  control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  output.ls[["+mu"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day + community_expected_mu,
                                data = data_subset,
                                family = lognormal,
                                control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  output.ls[["*mu"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*community_expected_mu,
                                data = data_subset,
                                family = lognormal,
                                control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  output.ls[["+prot +mu"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day + protegens + community_expected_mu,
                                      data = data_subset,
                                      family = lognormal,
                                      control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  output.ls[["+resist +mu"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day + resistant + community_expected_mu,
                                        data = data_subset,
                                        family = lognormal,
                                        control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  output.ls[["*prot +mu"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*protegens + community_expected_mu,
                                      data = data_subset,
                                      family = lognormal,
                                      control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  output.ls[["*prot + prot*mu"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*protegens + protegens*community_expected_mu,
                                            data = data_subset,
                                            family = lognormal,
                                            control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  output.ls[["*mu +prot"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*community_expected_mu + protegens,
                                      data = data_subset,
                                      family = lognormal,
                                      control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  output.ls[["*mu + mu*prot"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*community_expected_mu + community_expected_mu*protegens,
                                          data = data_subset,
                                          family = lognormal,
                                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  output.ls[["*mu +resist"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*community_expected_mu + resistant,
                                        data = data_subset,
                                        family = lognormal,
                                        control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  output.ls[["*mu + mu*resist"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*community_expected_mu + community_expected_mu*resistant,
                                            data = data_subset,
                                            family = lognormal,
                                            control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  output.ls[["*resist +mu"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*resistant + community_expected_mu,
                                        data = data_subset,
                                        family = lognormal,
                                        control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))

  output.ls[["*resist + resist*mu"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*resistant + resistant*community_expected_mu,
                                                data = data_subset,
                                                family = lognormal,
                                                control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  output.ls[["*prot*mu"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*protegens*community_expected_mu,
                                     data = data_subset,
                                     family = lognormal,
                                     control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  output.ls[["*mu*resist"]] <- glmmTMB(Diversity_scalePLUSepsilon ~ CommRich + Heat*Trtmt_Day*community_expected_mu*resistant,
                                       data = data_subset,
                                       family = lognormal,
                                       control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  return(output.ls)
}

# a function to plot model predictions against the data
  # INOCULATED COMMUNITY RICHNESS is plotted as different colours
plot_model_pred.CommRich <- function(mod_list, mod_name){
  # create data.frame for plotting
  divers_predict <- cbind(mod_list[[mod_name]]$frame,
                          predicted=predict(mod_list[[mod_name]], type="response"))
  # change the first column name for easier plotting
  colnames(divers_predict)[1] <- "observed"
  # create the plot
  out <- ggplot(divers_predict, 
                aes(x=Trtmt_Day, y=observed, colour=as.factor(CommRich))) +
         facet_grid(protegens~Heat) +
         geom_jitter(alpha=0.4) +
         geom_line(aes(y=predicted, group=as.factor(CommRich))) +
         scale_y_log10() +
         scale_colour_viridis_d(option = "viridis", begin=0.1, end=0.85) +
         labs(y="Shannon diversity (rescaled)",
              colour="CommRich",
              title=paste(mod_name, "model predictions"))
  return(out)
  rm(divers_predict)
}

# a function to plot model predictions against the data
  # EXPECTED COMMUNITY MU is plotted as different colours
plot_model_pred.MU <- function(mod_list, mod_name){
  # create data.frame for plotting
  divers_predict <- cbind(mod_list[[mod_name]]$frame,
                          predicted=predict(mod_list[[mod_name]], type="response"))
  # change the first column name for easier plotting
  colnames(divers_predict)[1] <- "observed"
  # create the plot
  out <- ggplot(divers_predict, 
                aes(x=Trtmt_Day, y=observed,
                    colour=community_expected_mu, group=as.factor(community_expected_mu))) +
         facet_grid(protegens~Heat) +
         geom_jitter(alpha=0.4) +
         geom_line(aes(y=predicted)) +
         scale_y_log10() +
         scale_colour_viridis_c(option = "inferno", end=0.85) +
         labs(y="Shannon diversity (rescaled)",
              colour="Expected\nCommunity mu",
              title=paste(mod_name, "model predictions"))
  return(out)
  rm(divers_predict)
}

####################
# 6h heat duration
####################
# grab just the treatment with its associated control data
diversity_6h <- rbind(diversity_forFit %>% filter(Heat == "6"),
                      diversity_forFit %>% filter(Heat == "control", Day < 4))
# create a column for last day of heat, first day of recovery, and last day of recovery
diversity_6h$Trtmt_Day <- "resist"
diversity_6h$Trtmt_Day[diversity_6h$Day == 2] <- "recov_1"
diversity_6h$Trtmt_Day[diversity_6h$Day == 3] <- "recov_2"

# fit different models:
div_mods6h <- fit_diversity_models(diversity_6h)

# check the simplest possible models for multicolinearity
check_collinearity(div_mods6h[["simple"]])
check_collinearity(div_mods6h[["simple resist"]])
# check the fit of the overall preferred model
simulateResiduals(fittedModel = div_mods6h[["*prot + prot*mu"]], plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.06 0.72 0.456 0.956 0.088 0.156 0.768 0.852 0.352 0.892 0.812 0.64 0.248 0.804 0.696 0.728 0.216 0.872 0.284 0.916 ...
summary(div_mods6h[["*prot + prot*mu"]])
##  Family: lognormal  ( log )
## Formula:          
## Diversity_scalePLUSepsilon ~ CommRich + Heat * Trtmt_Day * protegens +  
##     protegens * community_expected_mu
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##    -1202    -1143      617    -1234      280 
## 
## 
## Dispersion parameter for lognormal family (): 1.21 
## 
## Conditional model:
##                                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                            -1.12238    0.65007  -1.727  0.08425 .  
## CommRich                                0.65806    0.07012   9.384  < 2e-16 ***
## Heatcontrol                             1.55744    0.17430   8.935  < 2e-16 ***
## Trtmt_Dayrecov_2                       -0.22741    0.20855  -1.090  0.27551    
## Trtmt_Dayresist                         0.05919    0.19731   0.300  0.76419    
## protegens                              -8.00074    1.08149  -7.398 1.38e-13 ***
## community_expected_mu                  -1.03511    0.66866  -1.548  0.12161    
## Heatcontrol:Trtmt_Dayrecov_2           -0.28588    0.25017  -1.143  0.25314    
## Heatcontrol:Trtmt_Dayresist             0.17917    0.22391   0.800  0.42360    
## Heatcontrol:protegens                  -1.00383    0.34733  -2.890  0.00385 ** 
## Trtmt_Dayrecov_2:protegens              0.15475    0.36833   0.420  0.67438    
## Trtmt_Dayresist:protegens               0.10914    0.35737   0.305  0.76006    
## protegens:community_expected_mu         5.56391    1.17388   4.740 2.14e-06 ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens  0.23599    0.49838   0.474  0.63584    
## Heatcontrol:Trtmt_Dayresist:protegens  -0.07657    0.47374  -0.162  0.87160    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# print a summary table of the model fits
data.frame(pars = sapply(div_mods6h, npar_of_glmmTMB_fit),
                           AIC = sapply(div_mods6h, AIC),
                           AICc = sapply(div_mods6h, AICc),
                           BIC = sapply(div_mods6h, BIC)) %>%
                    mutate(dAIC = min(AIC)-AIC,
                           dAICc = min(AICc)-AICc,
                           dBIC = min(BIC)-BIC) %>% arrange(BIC)
# print some of the top model predictions
print(plot_model_pred.CommRich(mod_list=div_mods6h, mod_name="+prot"))

print(plot_model_pred.MU(mod_list=div_mods6h, mod_name="*prot + prot*mu"))

print(plot_model_pred.MU(mod_list=div_mods6h, mod_name="*prot*mu"))

####################
# 12h heat duration
####################
# grab just the treatment with its associated control data
diversity_12h <- rbind(diversity_forFit %>% filter(Heat == "12", Day > 1),
                       diversity_forFit %>% filter(Heat == "control", Day > 1, Day != 5))
# create a column for last day of heat, first day of recovery, and last day of recovery
diversity_12h$Trtmt_Day <- "resist"
diversity_12h$Trtmt_Day[diversity_12h$Day == 3] <- "recov_1"
diversity_12h$Trtmt_Day[diversity_12h$Day == 4] <- "recov_2"

# fit different models:
div_mods12h <- fit_diversity_models(diversity_12h)

# check the simplest possible models for multicolinearity
check_collinearity(div_mods12h[["simple"]])
check_collinearity(div_mods12h[["simple resist"]])
# check the fit of the overall preferred model
simulateResiduals(fittedModel = div_mods12h[["*prot + prot*mu"]], plot = TRUE)
## qu = 0.75, log(sigma) = -2.030192 : outer Newton did not converge fully.

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.696 0.028 0.964 0.224 0.628 0.356 0.42 0.992 0.072 0.104 0.252 0.46 0.22 0.02 0.744 0.364 0.82 0.828 0.376 0.208 ...
summary(div_mods12h[["*prot + prot*mu"]])
##  Family: lognormal  ( log )
## Formula:          
## Diversity_scalePLUSepsilon ~ CommRich + Heat * Trtmt_Day * protegens +  
##     protegens * community_expected_mu
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##  -1461.6  -1403.3    746.8  -1493.6      266 
## 
## 
## Dispersion parameter for lognormal family (): 1.58 
## 
## Conditional model:
##                                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                            -0.24192    0.79020  -0.306  0.75949    
## CommRich                                0.89015    0.09009   9.881  < 2e-16 ***
## Heatcontrol                             2.86009    0.31085   9.201  < 2e-16 ***
## Trtmt_Dayrecov_2                        0.52184    0.38211   1.366  0.17204    
## Trtmt_Dayresist                         0.96973    0.36672   2.644  0.00819 ** 
## protegens                              -9.17281    1.28869  -7.118 1.10e-12 ***
## community_expected_mu                  -4.69674    0.83311  -5.638 1.72e-08 ***
## Heatcontrol:Trtmt_Dayrecov_2           -0.56773    0.41311  -1.374  0.16935    
## Heatcontrol:Trtmt_Dayresist            -0.66119    0.39557  -1.672  0.09462 .  
## Heatcontrol:protegens                  -2.20145    0.44028  -5.000 5.73e-07 ***
## Trtmt_Dayrecov_2:protegens             -0.18892    0.48935  -0.386  0.69945    
## Trtmt_Dayresist:protegens              -0.92594    0.47977  -1.930  0.05361 .  
## protegens:community_expected_mu         8.80168    1.37591   6.397 1.58e-10 ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens -0.41540    0.61712  -0.673  0.50087    
## Heatcontrol:Trtmt_Dayresist:protegens   0.72654    0.58996   1.232  0.21813    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# print a summary table of the model fits
data.frame(pars = sapply(div_mods12h, npar_of_glmmTMB_fit),
                           AIC = sapply(div_mods12h, AIC),
                           AICc = sapply(div_mods12h, AICc),
                           BIC = sapply(div_mods12h, BIC)) %>%
                    mutate(dAIC = min(AIC)-AIC,
                           dAICc = min(AICc)-AICc,
                           dBIC = min(BIC)-BIC) %>% arrange(BIC)
# print some of the top model predictions
print(plot_model_pred.CommRich(mod_list=div_mods12h, mod_name="*prot"))

print(plot_model_pred.MU(mod_list=div_mods12h, mod_name="*prot + prot*mu"))

print(plot_model_pred.MU(mod_list=div_mods12h, mod_name="*prot*mu"))

####################
# 24h heat duration
####################
# grab just the treatment with its associated control data
diversity_24h <- rbind(diversity_forFit %>% filter(Heat == "24", Day > 1),
                       diversity_forFit %>% filter(Heat == "control", Day > 1, Day != 5))
# create a column for last day of heat, first day of recovery, and last day of recovery
diversity_24h$Trtmt_Day <- "resist"
diversity_24h$Trtmt_Day[diversity_24h$Day == 3] <- "recov_1"
diversity_24h$Trtmt_Day[diversity_24h$Day == 4] <- "recov_2"

# fit different models:
div_mods24h <- fit_diversity_models(diversity_24h)

# check the simplest possible models for multicolinearity
check_collinearity(div_mods24h[["simple"]])
check_collinearity(div_mods24h[["simple resist"]])
# check the fit of the overall preferred model
simulateResiduals(fittedModel = div_mods24h[["*prot + prot*mu"]], plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.848 0.46 0.116 0.808 0.356 0.788 0.304 0.328 0.2 0.872 0.076 0.152 0.164 0 0.932 0.364 0.744 0.232 0.98 0.956 ...
summary(div_mods24h[["*prot + prot*mu"]])
##  Family: lognormal  ( log )
## Formula:          
## Diversity_scalePLUSepsilon ~ CommRich + Heat * Trtmt_Day * protegens +  
##     protegens * community_expected_mu
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##   -998.3   -940.7    515.2  -1030.3      254 
## 
## 
## Dispersion parameter for lognormal family (): 1.35 
## 
## Conditional model:
##                                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                            -1.93414    0.81224  -2.381  0.01725 *  
## CommRich                                0.89152    0.07867  11.333  < 2e-16 ***
## Heatcontrol                             2.28475    0.27144   8.417  < 2e-16 ***
## Trtmt_Dayrecov_2                       -1.19430    0.38527  -3.100  0.00194 ** 
## Trtmt_Dayresist                         1.99389    0.28818   6.919 4.55e-12 ***
## protegens                              -5.22837    1.23774  -4.224 2.40e-05 ***
## community_expected_mu                  -2.11575    0.83966  -2.520  0.01174 *  
## Heatcontrol:Trtmt_Dayrecov_2            1.11271    0.41613   2.674  0.00750 ** 
## Heatcontrol:Trtmt_Dayresist            -1.57427    0.30765  -5.117 3.10e-07 ***
## Heatcontrol:protegens                  -3.49509    0.38270  -9.133  < 2e-16 ***
## Trtmt_Dayrecov_2:protegens              0.32389    0.46735   0.693  0.48829    
## Trtmt_Dayresist:protegens              -3.56825    0.40387  -8.835  < 2e-16 ***
## protegens:community_expected_mu         5.70594    1.28980   4.424 9.69e-06 ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens -0.89510    0.59839  -1.496  0.13469    
## Heatcontrol:Trtmt_Dayresist:protegens   3.25067    0.51526   6.309 2.81e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# print a summary table of the model fits
data.frame(pars = sapply(div_mods24h, npar_of_glmmTMB_fit),
                           AIC = sapply(div_mods24h, AIC),
                           AICc = sapply(div_mods24h, AICc),
                           BIC = sapply(div_mods24h, BIC)) %>%
                    mutate(dAIC = min(AIC)-AIC,
                           dAICc = min(AICc)-AICc,
                           dBIC = min(BIC)-BIC) %>% arrange(BIC)
# print some of the top model predictions
print(plot_model_pred.CommRich(mod_list=div_mods24h, mod_name="*prot"))

print(plot_model_pred.MU(mod_list=div_mods24h, mod_name="*prot + prot*mu"))

print(plot_model_pred.MU(mod_list=div_mods24h, mod_name="*prot*mu"))

####################
# 48h heat duration
####################
# grab just the treatment with its associated control data
diversity_48h <- rbind(diversity_forFit %>% filter(Heat == "48", Day > 2),
                       diversity_forFit %>% filter(Heat == "control", Day > 2))
# create a column for last day of heat, first day of recovery, and last day of recovery
diversity_48h$Trtmt_Day <- "resist"
diversity_48h$Trtmt_Day[diversity_48h$Day == 4] <- "recov_1"
diversity_48h$Trtmt_Day[diversity_48h$Day == 5] <- "recov_2"

# fit different models:
div_mods48h <- fit_diversity_models(diversity_48h)
## dropping columns from rank-deficient conditional model: Heatcontrol:Trtmt_Dayresist:protegens:community_expected_mu
# check the simplest possible models for multicolinearity
check_collinearity(div_mods48h[["simple"]])
check_collinearity(div_mods48h[["simple resist"]])
# check the fit of the overall preferred model
simulateResiduals(fittedModel = div_mods48h[["*prot + prot*mu"]], plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.504 0.468 0.484 0.628 0.368 0.312 0.172 0.572 0.492 0.552 0.412 0.5 0.468 0.4 0.456 0.2 0.24 0.312 0.312 0.904 ...
summary(div_mods48h[["*prot + prot*mu"]])
##  Family: lognormal  ( log )
## Formula:          
## Diversity_scalePLUSepsilon ~ CommRich + Heat * Trtmt_Day * protegens +  
##     protegens * community_expected_mu
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##  -1801.9  -1746.8    917.0  -1833.9      216 
## 
## 
## Dispersion parameter for lognormal family (): 1.21 
## 
## Conditional model:
##                                          Estimate Std. Error z value Pr(>|z|)
## (Intercept)                            -3.618e+00  8.944e-01  -4.045 5.24e-05
## CommRich                                7.245e-01  9.371e-02   7.731 1.07e-14
## Heatcontrol                             4.922e+00  3.864e-01  12.739  < 2e-16
## Trtmt_Dayrecov_2                       -9.450e-07  5.210e-01   0.000  1.00000
## Trtmt_Dayresist                        -5.184e-01  1.529e+00  -0.339  0.73463
## protegens                              -3.822e+00  1.453e+00  -2.631  0.00851
## community_expected_mu                  -2.871e+00  9.902e-01  -2.900  0.00373
## Heatcontrol:Trtmt_Dayrecov_2            1.838e-01  5.434e-01   0.338  0.73519
## Heatcontrol:Trtmt_Dayresist             5.907e-01  1.537e+00   0.384  0.70077
## Heatcontrol:protegens                  -4.332e+00  5.327e-01  -8.133 4.20e-16
## Trtmt_Dayrecov_2:protegens              1.906e-01  6.327e-01   0.301  0.76318
## Trtmt_Dayresist:protegens               1.571e-01  1.618e+00   0.097  0.92266
## protegens:community_expected_mu         4.490e+00  1.554e+00   2.890  0.00386
## Heatcontrol:Trtmt_Dayrecov_2:protegens  2.414e-01  7.330e-01   0.329  0.74195
## Heatcontrol:Trtmt_Dayresist:protegens   4.047e-01  1.661e+00   0.244  0.80748
##                                           
## (Intercept)                            ***
## CommRich                               ***
## Heatcontrol                            ***
## Trtmt_Dayrecov_2                          
## Trtmt_Dayresist                           
## protegens                              ** 
## community_expected_mu                  ** 
## Heatcontrol:Trtmt_Dayrecov_2              
## Heatcontrol:Trtmt_Dayresist               
## Heatcontrol:protegens                  ***
## Trtmt_Dayrecov_2:protegens                
## Trtmt_Dayresist:protegens                 
## protegens:community_expected_mu        ** 
## Heatcontrol:Trtmt_Dayrecov_2:protegens    
## Heatcontrol:Trtmt_Dayresist:protegens     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# print a summary table of the model fits
data.frame(pars = sapply(div_mods48h, npar_of_glmmTMB_fit),
           AIC = sapply(div_mods48h, AIC),
           AICc = sapply(div_mods48h, AICc),
           BIC = sapply(div_mods48h, BIC)) %>%
                    mutate(dAIC = min(AIC)-AIC,
                           dAICc = min(AICc)-AICc,
                           dBIC = min(BIC)-BIC) %>% arrange(BIC)
# print some of the top model predictions
print(plot_model_pred.CommRich(mod_list=div_mods48h, mod_name="*prot"))

print(plot_model_pred.MU(mod_list=div_mods48h, mod_name="*prot + prot*mu"))

print(plot_model_pred.MU(mod_list=div_mods48h, mod_name="*prot +mu"))

####################
# Select the best model across all data subsets
####################
# for each information criterion, get the average across all data subsets
meanIC <- data.frame(pars = sapply(div_mods48h, npar_of_glmmTMB_fit),
                     AIC = sapply(div_mods6h, AIC) + sapply(div_mods12h, AIC) + sapply(div_mods24h, AIC) + sapply(div_mods48h, AIC),
                     AICc = sapply(div_mods6h, AICc) + sapply(div_mods12h, AICc) + sapply(div_mods24h, AICc) + sapply(div_mods48h, AICc),
                     BIC = sapply(div_mods6h, BIC) + sapply(div_mods12h, BIC) + sapply(div_mods24h, BIC) + sapply(div_mods48h, BIC)) %>%
            mutate(AIC = AIC/4,
                   AICc = AICc/4,
                   BIC = BIC/4) %>%
              mutate(dAIC = min(AIC)-AIC,
                     dAICc = min(AICc)-AICc,
                     dBIC = min(BIC)-BIC)
meanIC %>% arrange(BIC)
# clean up
rm(meanIC)

As expected, inoculated community richness has a positive effect on diversity, protegens has a negative effect, and community growth rate has a negative effect. There are interactions with presence/absence of heat but the trouble is that we don’t want the model to be overly complex (see model predicts for the most complex model above; the predictions are just rubbish).

The short heat duration data tends to prefer the community growth rate as an interaction effect with heat, especially for 12h where slow communities had much more diversity in the presence of heat for some reason. As the heat duration gets longer, the data tends to prefer protegens as an interaction effect with heat, especially for 48h where protegens presence now shows more diversity than absence (i.e., because there are extinction events in the absence of protegens).

From the model predictions, we can see why it really doesn’t make sense to use the very complex model (*prot*mu). I’m going to use the *prot + prot*mu model for the analysis below because this model includes all the predictors of interest, is not unnecessarily complex, it’s the 2nd best fit for each data subset, and it’s the 2nd best fit across the entire data (after the too complex *prot*mu).

Calculate effect sizes

Even if I wanted to calculate effect sizes for the most complex model, I wouldn’t be able to do it because there’s too many NA values during resistance at 48h heat. This leads to nonest values:

emmeans(div_mods48h[["*prot*mu"]],
                   ~ Heat | CommRich + Trtmt_Day*protegens*community_expected_mu,
                   data = diversity_48h, type = "response")
## CommRich = 2.52, Trtmt_Day = recov_1, protegens = 0, community_expected_mu = 0.89:
##  Heat    response      SE  df asymp.LCL asymp.UCL
##  48        0.0131 0.00504 Inf   0.00620    0.0279
##  control   1.5843 0.25000 Inf   1.16234    2.1593
## 
## CommRich = 2.52, Trtmt_Day = recov_2, protegens = 0, community_expected_mu = 0.89:
##  Heat    response      SE  df asymp.LCL asymp.UCL
##  48        0.0131 0.00504 Inf   0.00620    0.0279
##  control   2.1893 0.27700 Inf   1.70785    2.8066
## 
## CommRich = 2.52, Trtmt_Day = resist, protegens = 0, community_expected_mu = 0.89:
##  Heat    response      SE  df asymp.LCL asymp.UCL
##  48        nonEst      NA  NA        NA        NA
##  control   1.7680 0.24600 Inf   1.34670    2.3212
## 
## CommRich = 2.52, Trtmt_Day = recov_1, protegens = 1, community_expected_mu = 0.89:
##  Heat    response      SE  df asymp.LCL asymp.UCL
##  48        0.0145 0.00403 Inf   0.00845    0.0251
##  control   0.0263 0.00742 Inf   0.01513    0.0457
## 
## CommRich = 2.52, Trtmt_Day = recov_2, protegens = 1, community_expected_mu = 0.89:
##  Heat    response      SE  df asymp.LCL asymp.UCL
##  48        0.0173 0.00473 Inf   0.01011    0.0296
##  control   0.0489 0.01180 Inf   0.03045    0.0786
## 
## CommRich = 2.52, Trtmt_Day = resist, protegens = 1, community_expected_mu = 0.89:
##  Heat    response      SE  df asymp.LCL asymp.UCL
##  48        0.0103 0.00489 Inf   0.00407    0.0261
##  control   0.0491 0.01180 Inf   0.03060    0.0788
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the log scale
# use emmeans to get the effect size during heat as compared to control for each of the treatment days AND conditional on protegens
emm_6h <- emmeans(div_mods6h[["*prot + prot*mu"]],
                  ~ Heat | CommRich + Trtmt_Day*protegens + community_expected_mu*protegens,
                  data = diversity_6h, type = "response")
effect_6h <- eff_size(emm_6h, sigma(div_mods6h[["*prot + prot*mu"]]),
                      edf = df.residual(div_mods6h[["*prot + prot*mu"]]))

emm_12h <- emmeans(div_mods12h[["*prot + prot*mu"]],
                   ~ Heat | CommRich + Trtmt_Day*protegens + community_expected_mu*protegens,
                   data = diversity_12h, type = "response")
effect_12h <- eff_size(emm_12h, sigma(div_mods12h[["*prot + prot*mu"]]),
                       edf = df.residual(div_mods12h[["*prot + prot*mu"]]))

emm_24h <- emmeans(div_mods24h[["*prot + prot*mu"]],
                   ~ Heat | CommRich + Trtmt_Day*protegens + community_expected_mu*protegens,
                   data = diversity_24h, type = "response")
effect_24h <- eff_size(emm_24h, sigma(div_mods24h[["*prot + prot*mu"]]),
                       edf = df.residual(div_mods24h[["*prot + prot*mu"]]))

emm_48h <- emmeans(div_mods48h[["*prot + prot*mu"]],
                   ~ Heat | CommRich + Trtmt_Day*protegens + community_expected_mu*protegens,
                   data = diversity_48h, type = "response")
effect_48h <- eff_size(emm_48h, sigma(div_mods48h[["*prot + prot*mu"]]),
                       edf = df.residual(div_mods48h[["*prot + prot*mu"]]))

# a function that extracts the confidence intervals from eff_size *** contingent on protegens ***
get_effsize_CIs <- function(eff_size_object, heat_trtmt) {
  data.frame(Heat = heat_trtmt,
             CommRich = confint(eff_size_object)[[2]],
             Trtmt_Day = confint(eff_size_object)[[3]],
             protegens = confint(eff_size_object)[[4]],
             expected_mu = confint(eff_size_object)[[5]], #
             est = confint(eff_size_object)[[6]], #[[5]],
             loCI = confint(eff_size_object)[[9]], #[[8]],
             hiCI = confint(eff_size_object)[[10]]) #[[9]])
}

# create a data.frame for plotting marginal effect sizes using a forest plot with the group labels
div_effects_protegens <- data.frame()
div_effects_protegens <- rbind(div_effects_protegens,
                              get_effsize_CIs(effect_6h, heat_trtmt = 6),
                              get_effsize_CIs(effect_12h, heat_trtmt = 12),
                              get_effsize_CIs(effect_24h, heat_trtmt = 24),
                              get_effsize_CIs(effect_48h, heat_trtmt = 48))
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
div_effects_protegens$Trtmt_Day <- factor(div_effects_protegens$Trtmt_Day,
                                          levels = c("resist", "recov_1", "recov_2"))
levels(div_effects_protegens$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")

# plot conditional part of the model
ggplot(div_effects_protegens,
       aes(x = est, y = as.factor(Heat), colour = Trtmt_Day, shape = as.logical(protegens))) +
  geom_vline(xintercept = 0, colour="darkgrey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbarh(position = position_dodge(width = 0.5),
                 aes(xmin = loCI, xmax = hiCI), height = 0.1) +
  scale_colour_manual(values=trtmt_pal) +
  labs(title = "CommRich + Heat*Trtmt_Day*prot + com_expect_mu*prot",
       x = "Effect Size on Shannon Diversity",
       shape = "protegens\npresent?",
       y="Heat duration")

# we can do a posthoc on this to illustrate statistically significant effects
posthocPROT_6h <- emmeans(effect_6h, pairwise ~ Trtmt_Day*protegens, data = diversity_6h)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_12h <- emmeans(effect_12h, pairwise ~ Trtmt_Day*protegens, data = diversity_12h)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_24h <- emmeans(effect_24h, pairwise ~ Trtmt_Day*protegens, data = diversity_24h)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_48h <- emmeans(effect_48h, pairwise ~ Trtmt_Day*protegens, data = diversity_48h)
## NOTE: Results may be misleading due to involvement in interactions
# a function that extracts the confidence intervals from a post-hoc object
get_posthoc <- function(posthoc_object, heat_trtmt) {
  output <- multcomp::cld(posthoc_object, alpha=0.05/4, Letters = letters) %>%
              data.frame() %>%
                select(-df)
  colnames(output)[3:7] <- c("est", "SE", "loCI", "hiCI", "groups")
  output$Heat <- heat_trtmt
  return(output)
}

# create a data.frame for plotting
div_effects_protegens <- data.frame()
div_effects_protegens <- rbind(div_effects_protegens,
                               get_posthoc(posthocPROT_6h, heat_trtmt = 6),
                               get_posthoc(posthocPROT_12h, heat_trtmt = 12),
                               get_posthoc(posthocPROT_24h, heat_trtmt = 24),
                               get_posthoc(posthocPROT_48h, heat_trtmt = 48))
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
div_effects_protegens$Trtmt_Day <- factor(div_effects_protegens$Trtmt_Day,
                                          levels = c("resist", "recov_1", "recov_2"))
levels(div_effects_protegens$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")

# plot with group labels
ggplot(div_effects_protegens,
       aes(x = est, y = as.factor(Heat), colour = Trtmt_Day, shape=as.logical(protegens))) +
  facet_grid(~protegens) +
  geom_vline(xintercept = 0, colour="darkgrey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbarh(position = position_dodge(width = 0.5),
                 aes(xmin = loCI, xmax = hiCI), height = 0.1) +
  geom_text(position = position_dodge(width = 0.5),
            aes(x=-2.5, label=groups)) +
  scale_colour_manual(values=trtmt_pal) +
  labs(x = "Effect Size on Shannon Diversity",
       y="Heat duration",
       shape = "protegens\npresent?",
       title = "CommRich + Heat*Trtmt_Day*prot + com_expect_mu*prot")

# Note that there is a significant interaction between Treatment Day & protegens!
# I think that it may still be okay to average over the effects of protegens because they are not crossed

# anyway we can still average over the effect of protegens
# we can do a posthoc on this to illustrate statistically significant effects
posthoc_6h <- emmeans(effect_6h, pairwise ~ Trtmt_Day, data = diversity_6h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_12h <- emmeans(effect_12h, pairwise ~ Trtmt_Day, data = diversity_12h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_24h <- emmeans(effect_24h, pairwise ~ Trtmt_Day, data = diversity_24h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_48h <- emmeans(effect_48h, pairwise ~ Trtmt_Day, data = diversity_48h)
## NOTE: Results may be misleading due to involvement in interactions
# a function that extracts the confidence intervals from a post-hoc object
  ## we need to redefine the function because the colnames have changed now
get_posthoc <- function(posthoc_object, heat_trtmt) {
  output <- multcomp::cld(posthoc_object, alpha=0.05/4, Letters = letters) %>%
              data.frame() %>%
                select(-df)
  colnames(output)[2:6] <- c("est", "SE", "loCI", "hiCI", "groups")
  output$Heat <- heat_trtmt
  return(output)
}
# create a data.frame for plotting
div_effects <- data.frame()
div_effects <- rbind(div_effects,
                     get_posthoc(posthoc_6h, heat_trtmt = 6),
                     get_posthoc(posthoc_12h, heat_trtmt = 12),
                     get_posthoc(posthoc_24h, heat_trtmt = 24),
                     get_posthoc(posthoc_48h, heat_trtmt = 48))
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
div_effects$Trtmt_Day <- factor(div_effects$Trtmt_Day,
                                         levels = c("resist", "recov_1", "recov_2"))
levels(div_effects$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")

# plot with group labels
ggplot(div_effects,
       aes(x = est, y = as.factor(Heat), colour = Trtmt_Day)) +
  geom_vline(xintercept = 0, colour="darkgrey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbarh(position = position_dodge(width = 0.5),
                 aes(xmin = loCI, xmax = hiCI), height = 0.1) +
  geom_text(position = position_dodge(width = 0.5),
            aes(x=-2.5, label=groups)) +
  scale_colour_manual(values=trtmt_pal) +
  labs(x = "Effect Size on Shannon Diversity",
       y="Heat duration",
       title = "protegens as non-focal predictor (i.e., marginalized)")

#######
# finally, we will do a series of pairwise two-tailed t-tests to compare between heat durations
#######
  # reminder to myself: I tried this as a series of z-tests and that made things more optimistic (aka LESS conservative). The t-test is indeed the more conservative option among the parametric tests.
  # I also looked into whether it's possible to do a Mann-Whitney test (aka Wilcoxon signed-rank test). But, since that is a *non-parametric test*, by definition you would need raw data to run it (i.e., *not* summary statistics). So I'm a bit confused about whether & how to run a non-parametric test...

# a function that approximates the sample size from each data subset
estimate_n <- function(data_subset, CommRich = FALSE) {
  if(CommRich == 0) {
    # get the number of unique ID's present on recovery day 2 for the heat treatment
    # then divide this by 4 as we want to know the average sample size across CommRich
    output <- length(unique(data_subset[data_subset$Heat != "control" & data_subset$Trtmt_Day == "recov_2",]$uniqID))/4
  }
  if(CommRich == 1){ # do the same thing for specified values of CommRich
    output <- length(unique(data_subset[data_subset$Heat != "control" & data_subset$Trtmt_Day == "recov_2" & data_subset$CommRich == 1,]$uniqID))/4
  }
  if(CommRich == 2){
    output <- length(unique(data_subset[data_subset$Heat != "control" & data_subset$Trtmt_Day == "recov_2" & data_subset$CommRich == 2,]$uniqID))/4
  }
  if(CommRich == 3){
    output <- length(unique(data_subset[data_subset$Heat != "control" & data_subset$Trtmt_Day == "recov_2" & data_subset$CommRich == 3,]$uniqID))/4
  }
  if(CommRich == 4){
    output <- length(unique(data_subset[data_subset$Heat != "control" & data_subset$Trtmt_Day == "recov_2" & data_subset$CommRich == 4,]$uniqID))/4
  }
  return(output)
}
# a function that runs two-tailed t-test between row numbers of diversity_effects df
run_ttest <- function(row_x, row_y,
                      summary_stats_df){
  ttest_object <- tsum.test(mean.x = summary_stats_df$est[row_x],
                            s.x = summary_stats_df$SE[row_x],
                            n.x = summary_stats_df$n[row_x],
                            mean.y = summary_stats_df$est[row_y],
                            s.y = summary_stats_df$SE[row_y],
                            n.y = summary_stats_df$n[row_y],
                            alternative="two.sided")
  return(data.frame(t_statistic = ttest_object$statistic,
                    df = ttest_object$parameters,
                    pvalue = ttest_object$p.value))
}

# estimate the sample sizes
temp <- div_effects # copy the effects to temp
div_effects <- rbind(temp %>% filter(Heat == 6) %>% mutate(n = estimate_n(diversity_6h)),
                     temp %>% filter(Heat == 12) %>% mutate(n = estimate_n(diversity_12h)),
                     temp %>% filter(Heat == 24) %>% mutate(n = estimate_n(diversity_24h)),
                     temp %>% filter(Heat == 48) %>% mutate(n = estimate_n(diversity_48h)))
rm(temp)
# estimate the SD from the SE
div_effects <- div_effects %>% mutate(SD = SE * sqrt(n)) %>%
    # re-order by Heat and Trtmt_Day
                          arrange(Heat, Trtmt_Day)

# all pairwise combinations of comparisons between the same treatment day for different durations
temp <- t(combn(c(1,4,7,10), 2))
combos <- rbind(temp, temp+1, temp+2)
rm(temp)
# loop through all the combinations and do the t-tests
divEffects_ttests <- data.frame()
for(i in 1:nrow(combos)){
  divEffects_ttests <- rbind(divEffects_ttests,
                             run_ttest(row_x = combos[i,1],
                                       row_y = combos[i,2],
                                       summary_stats_df = div_effects))
}
divEffects_ttests$adjusted_p <- p.adjust(divEffects_ttests$pvalue, method = "bonferroni")
divEffects_ttests$Trtmt_Day <- div_effects$Trtmt_Day[combos[,1]]
divEffects_ttests$Heat_1 <- div_effects$Heat[combos[,1]]
divEffects_ttests$Heat_2 <- div_effects$Heat[combos[,2]]

print(divEffects_ttests)
##     t_statistic       df       pvalue   adjusted_p      Trtmt_Day Heat_1 Heat_2
## t     -1.170623 24.24747 2.531273e-01 1.000000e+00     Resistance      6     12
## t1   -10.327820 23.17869 3.811551e-10 6.860792e-09     Resistance      6     24
## t2     9.788123 11.31564 7.291091e-07 1.312396e-05     Resistance      6     48
## t3    -9.253941 22.45402 4.028679e-09 7.251623e-08     Resistance     12     24
## t4    10.112351 11.28167 5.355511e-07 9.639920e-06     Resistance     12     48
## t5    12.591245 11.28156 5.445801e-08 9.802443e-07     Resistance     24     48
## t6     4.317090 24.20276 2.321737e-04 4.179126e-03 Early Recovery      6     12
## t7    -8.380205 22.96627 1.938259e-08 3.488867e-07 Early Recovery      6     24
## t8    17.537802 16.84888 2.958032e-12 5.324457e-11 Early Recovery      6     48
## t9   -12.567165 22.39988 1.267737e-11 2.281927e-10 Early Recovery     12     24
## t10   14.464154 16.90571 5.971506e-11 1.074871e-09 Early Recovery     12     48
## t11   23.273543 16.99798 2.486209e-14 4.475177e-13 Early Recovery     24     48
## t12   -1.928909 24.24804 6.552685e-02 1.000000e+00  Late Recovery      6     12
## t13    2.366998 21.49121 2.738265e-02 4.928876e-01  Late Recovery      6     24
## t14   22.839750 17.67621 1.441230e-14 2.594215e-13  Late Recovery      6     48
## t15    4.097854 20.91181 5.179505e-04 9.323109e-03  Late Recovery     12     24
## t16   24.384022 17.31424 7.558979e-15 1.360616e-13  Late Recovery     12     48
## t17   19.527338 19.61981 2.542227e-14 4.576009e-13  Late Recovery     24     48
# these p-values seem overly optimistic. Use alpha = 1*10^-3

# cleanup
rm(div_mods6h, div_mods12h, div_mods24h, div_mods48h,
   emm_6h, emm_12h, emm_24h, emm_48h, effect_6h, effect_12h, effect_24h, effect_48h,
   posthocPROT_6h, posthocPROT_12h, posthocPROT_24h, posthocPROT_48h, posthoc_6h, posthoc_12h, posthoc_24h, posthoc_48h,
   temp, combos, divEffects_ttests,
   div_effects_protegens, div_effects)
## Warning in rm(div_mods6h, div_mods12h, div_mods24h, div_mods48h, emm_6h, :
## object 'temp' not found

We don’t see any significant decoupling here between the effect size during resistance as compared to during recovery.

The main effect that we see is that diversity really drops down a lot for the 48h heat pulse. This is mostly due to loss of the most sensitive species but it can also be driven by extinction of entire replicates. This data includes all replicates, even ones that went extinct altogether. Both monocultures and extinct replicates will have a final Shannon diversity of 0. So let’s repeat the analysis to show that it’s not the extinct wells that are driving the low effect size at 48h duration.

Repeat diversity analysis removing extinct replicates

Let’s show that the strong effect at 48h is not due entirely to the presence of the extinct reps. I will re-do the entire analysis above but this time using only the data without the extinct replicates.

# add a column indicating whether the replicate survived
  # but first we need to remove $Heat because it's a factor for diversity but numeric for extinctions and cannot be *_joined
tmp_div <- diversity_forFit %>% select(-Heat)
tmp_div <- inner_join(tmp_div,
                      extinct.df %>% select(uniqID, survived),
                      by = c("uniqID"))
diversity_forFit$survived <- tmp_div$survived
rm(tmp_div)

# keep just the diversity values that did not go extinct
diversity_forFit <- diversity_forFit %>% filter(survived == 1)

####################
# 6h heat duration
####################
# grab just the treatment with its associated control data
diversity_6h <- rbind(diversity_forFit %>% filter(Heat == "6"),
                      diversity_forFit %>% filter(Heat == "control", Day < 4))
# create a column for last day of heat, first day of recovery, and last day of recovery
diversity_6h$Trtmt_Day <- "resist"
diversity_6h$Trtmt_Day[diversity_6h$Day == 2] <- "recov_1"
diversity_6h$Trtmt_Day[diversity_6h$Day == 3] <- "recov_2"

# fit different models:
div_mods6h <- fit_diversity_models(diversity_6h)

# check the simplest possible models for multicolinearity
check_collinearity(div_mods6h[["simple"]])
check_collinearity(div_mods6h[["simple resist"]])
# check the fit of the overall preferred model
simulateResiduals(fittedModel = div_mods6h[["*prot + prot*mu"]], plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.06 0.72 0.456 0.956 0.088 0.156 0.768 0.852 0.352 0.892 0.812 0.64 0.248 0.804 0.696 0.728 0.216 0.872 0.284 0.916 ...
summary(div_mods6h[["*prot + prot*mu"]])
##  Family: lognormal  ( log )
## Formula:          
## Diversity_scalePLUSepsilon ~ CommRich + Heat * Trtmt_Day * protegens +  
##     protegens * community_expected_mu
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##    -1202    -1143      617    -1234      280 
## 
## 
## Dispersion parameter for lognormal family (): 1.21 
## 
## Conditional model:
##                                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                            -1.12238    0.65007  -1.727  0.08425 .  
## CommRich                                0.65806    0.07012   9.384  < 2e-16 ***
## Heatcontrol                             1.55744    0.17430   8.935  < 2e-16 ***
## Trtmt_Dayrecov_2                       -0.22741    0.20855  -1.090  0.27551    
## Trtmt_Dayresist                         0.05919    0.19731   0.300  0.76419    
## protegens                              -8.00074    1.08149  -7.398 1.38e-13 ***
## community_expected_mu                  -1.03511    0.66866  -1.548  0.12161    
## Heatcontrol:Trtmt_Dayrecov_2           -0.28588    0.25017  -1.143  0.25314    
## Heatcontrol:Trtmt_Dayresist             0.17917    0.22391   0.800  0.42360    
## Heatcontrol:protegens                  -1.00383    0.34733  -2.890  0.00385 ** 
## Trtmt_Dayrecov_2:protegens              0.15475    0.36833   0.420  0.67438    
## Trtmt_Dayresist:protegens               0.10914    0.35737   0.305  0.76006    
## protegens:community_expected_mu         5.56391    1.17388   4.740 2.14e-06 ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens  0.23599    0.49838   0.474  0.63584    
## Heatcontrol:Trtmt_Dayresist:protegens  -0.07657    0.47374  -0.162  0.87160    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# print a summary table of the model fits
data.frame(pars = sapply(div_mods6h, npar_of_glmmTMB_fit),
                           AIC = sapply(div_mods6h, AIC),
                           AICc = sapply(div_mods6h, AICc),
                           BIC = sapply(div_mods6h, BIC)) %>%
                    mutate(dAIC = min(AIC)-AIC,
                           dAICc = min(AICc)-AICc,
                           dBIC = min(BIC)-BIC) %>% arrange(BIC)
# print some of the top model predictions
print(plot_model_pred.CommRich(mod_list=div_mods6h, mod_name="+prot"))

print(plot_model_pred.MU(mod_list=div_mods6h, mod_name="*prot + prot*mu"))

print(plot_model_pred.MU(mod_list=div_mods6h, mod_name="*prot*mu"))

####################
# 12h heat duration
####################
# grab just the treatment with its associated control data
diversity_12h <- rbind(diversity_forFit %>% filter(Heat == "12", Day > 1),
                       diversity_forFit %>% filter(Heat == "control", Day > 1, Day != 5))
# create a column for last day of heat, first day of recovery, and last day of recovery
diversity_12h$Trtmt_Day <- "resist"
diversity_12h$Trtmt_Day[diversity_12h$Day == 3] <- "recov_1"
diversity_12h$Trtmt_Day[diversity_12h$Day == 4] <- "recov_2"

# fit different models:
div_mods12h <- fit_diversity_models(diversity_12h)

# check the simplest possible models for multicolinearity
check_collinearity(div_mods12h[["simple"]])
check_collinearity(div_mods12h[["simple resist"]])
# check the fit of the overall preferred model
simulateResiduals(fittedModel = div_mods12h[["*prot + prot*mu"]], plot = TRUE)
## qu = 0.75, log(sigma) = -2.030192 : outer Newton did not converge fully.

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.696 0.028 0.964 0.224 0.628 0.356 0.42 0.992 0.072 0.104 0.252 0.46 0.22 0.02 0.744 0.364 0.82 0.828 0.376 0.208 ...
summary(div_mods12h[["*prot + prot*mu"]])
##  Family: lognormal  ( log )
## Formula:          
## Diversity_scalePLUSepsilon ~ CommRich + Heat * Trtmt_Day * protegens +  
##     protegens * community_expected_mu
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##  -1461.6  -1403.3    746.8  -1493.6      266 
## 
## 
## Dispersion parameter for lognormal family (): 1.58 
## 
## Conditional model:
##                                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                            -0.24192    0.79020  -0.306  0.75949    
## CommRich                                0.89015    0.09009   9.881  < 2e-16 ***
## Heatcontrol                             2.86009    0.31085   9.201  < 2e-16 ***
## Trtmt_Dayrecov_2                        0.52184    0.38211   1.366  0.17204    
## Trtmt_Dayresist                         0.96973    0.36672   2.644  0.00819 ** 
## protegens                              -9.17281    1.28869  -7.118 1.10e-12 ***
## community_expected_mu                  -4.69674    0.83311  -5.638 1.72e-08 ***
## Heatcontrol:Trtmt_Dayrecov_2           -0.56773    0.41311  -1.374  0.16935    
## Heatcontrol:Trtmt_Dayresist            -0.66119    0.39557  -1.672  0.09462 .  
## Heatcontrol:protegens                  -2.20145    0.44028  -5.000 5.73e-07 ***
## Trtmt_Dayrecov_2:protegens             -0.18892    0.48935  -0.386  0.69945    
## Trtmt_Dayresist:protegens              -0.92594    0.47977  -1.930  0.05361 .  
## protegens:community_expected_mu         8.80168    1.37591   6.397 1.58e-10 ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens -0.41540    0.61712  -0.673  0.50087    
## Heatcontrol:Trtmt_Dayresist:protegens   0.72654    0.58996   1.232  0.21813    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# print a summary table of the model fits
data.frame(pars = sapply(div_mods12h, npar_of_glmmTMB_fit),
                           AIC = sapply(div_mods12h, AIC),
                           AICc = sapply(div_mods12h, AICc),
                           BIC = sapply(div_mods12h, BIC)) %>%
                    mutate(dAIC = min(AIC)-AIC,
                           dAICc = min(AICc)-AICc,
                           dBIC = min(BIC)-BIC) %>% arrange(BIC)
# print some of the top model predictions
print(plot_model_pred.CommRich(mod_list=div_mods12h, mod_name="*prot"))

print(plot_model_pred.MU(mod_list=div_mods12h, mod_name="*prot + prot*mu"))

print(plot_model_pred.MU(mod_list=div_mods12h, mod_name="*prot*mu"))

####################
# 24h heat duration
####################
# grab just the treatment with its associated control data
diversity_24h <- rbind(diversity_forFit %>% filter(Heat == "24", Day > 1),
                       diversity_forFit %>% filter(Heat == "control", Day > 1, Day != 5))
# create a column for last day of heat, first day of recovery, and last day of recovery
diversity_24h$Trtmt_Day <- "resist"
diversity_24h$Trtmt_Day[diversity_24h$Day == 3] <- "recov_1"
diversity_24h$Trtmt_Day[diversity_24h$Day == 4] <- "recov_2"

# fit different models:
div_mods24h <- fit_diversity_models(diversity_24h)

# check the simplest possible models for multicolinearity
check_collinearity(div_mods24h[["simple"]])
check_collinearity(div_mods24h[["simple resist"]])
# check the fit of the overall preferred model
simulateResiduals(fittedModel = div_mods24h[["*prot + prot*mu"]], plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.848 0.46 0.116 0.808 0.356 0.788 0.304 0.328 0.2 0.872 0.076 0.152 0.164 0 0.932 0.364 0.744 0.232 0.98 0.956 ...
summary(div_mods24h[["*prot + prot*mu"]])
##  Family: lognormal  ( log )
## Formula:          
## Diversity_scalePLUSepsilon ~ CommRich + Heat * Trtmt_Day * protegens +  
##     protegens * community_expected_mu
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##   -998.3   -940.7    515.2  -1030.3      254 
## 
## 
## Dispersion parameter for lognormal family (): 1.35 
## 
## Conditional model:
##                                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                            -1.93414    0.81224  -2.381  0.01725 *  
## CommRich                                0.89152    0.07867  11.333  < 2e-16 ***
## Heatcontrol                             2.28475    0.27144   8.417  < 2e-16 ***
## Trtmt_Dayrecov_2                       -1.19430    0.38527  -3.100  0.00194 ** 
## Trtmt_Dayresist                         1.99389    0.28818   6.919 4.55e-12 ***
## protegens                              -5.22837    1.23774  -4.224 2.40e-05 ***
## community_expected_mu                  -2.11575    0.83966  -2.520  0.01174 *  
## Heatcontrol:Trtmt_Dayrecov_2            1.11271    0.41613   2.674  0.00750 ** 
## Heatcontrol:Trtmt_Dayresist            -1.57427    0.30765  -5.117 3.10e-07 ***
## Heatcontrol:protegens                  -3.49509    0.38270  -9.133  < 2e-16 ***
## Trtmt_Dayrecov_2:protegens              0.32389    0.46735   0.693  0.48829    
## Trtmt_Dayresist:protegens              -3.56825    0.40387  -8.835  < 2e-16 ***
## protegens:community_expected_mu         5.70594    1.28980   4.424 9.69e-06 ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens -0.89510    0.59839  -1.496  0.13469    
## Heatcontrol:Trtmt_Dayresist:protegens   3.25067    0.51526   6.309 2.81e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# print a summary table of the model fits
data.frame(pars = sapply(div_mods24h, npar_of_glmmTMB_fit),
                           AIC = sapply(div_mods24h, AIC),
                           AICc = sapply(div_mods24h, AICc),
                           BIC = sapply(div_mods24h, BIC)) %>%
                    mutate(dAIC = min(AIC)-AIC,
                           dAICc = min(AICc)-AICc,
                           dBIC = min(BIC)-BIC) %>% arrange(BIC)
# print some of the top model predictions
print(plot_model_pred.CommRich(mod_list=div_mods24h, mod_name="*prot"))

print(plot_model_pred.MU(mod_list=div_mods24h, mod_name="*prot + prot*mu"))

print(plot_model_pred.MU(mod_list=div_mods24h, mod_name="*prot*mu"))

####################
# 48h heat duration
####################
# grab just the treatment with its associated control data
diversity_48h <- rbind(diversity_forFit %>% filter(Heat == "48", Day > 2),
                       diversity_forFit %>% filter(Heat == "control", Day > 2))
# create a column for last day of heat, first day of recovery, and last day of recovery
diversity_48h$Trtmt_Day <- "resist"
diversity_48h$Trtmt_Day[diversity_48h$Day == 4] <- "recov_1"
diversity_48h$Trtmt_Day[diversity_48h$Day == 5] <- "recov_2"

# fit different models:
div_mods48h <- fit_diversity_models(diversity_48h)
## dropping columns from rank-deficient conditional model: Heatcontrol:Trtmt_Dayresist:protegens:community_expected_mu
# check the simplest possible models for multicolinearity
check_collinearity(div_mods48h[["simple"]])
check_collinearity(div_mods48h[["simple resist"]])
# check the fit of the overall preferred model
simulateResiduals(fittedModel = div_mods48h[["*prot + prot*mu"]], plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.48 0.532 0.572 0.604 0.328 0.312 0.176 0.54 0.42 0.5 0.416 0.352 0.504 0.492 0.516 0.18 0.3 0.332 0.904 0.44 ...
summary(div_mods48h[["*prot + prot*mu"]])
##  Family: lognormal  ( log )
## Formula:          
## Diversity_scalePLUSepsilon ~ CommRich + Heat * Trtmt_Day * protegens +  
##     protegens * community_expected_mu
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##  -1563.0  -1509.0    797.5  -1595.0      200 
## 
## 
## Dispersion parameter for lognormal family (): 1.22 
## 
## Conditional model:
##                                          Estimate Std. Error z value Pr(>|z|)
## (Intercept)                            -3.173e+00  1.016e+00  -3.122  0.00180
## CommRich                                7.391e-01  9.482e-02   7.795 6.44e-15
## Heatcontrol                             4.656e+00  5.362e-01   8.683  < 2e-16
## Trtmt_Dayrecov_2                        1.496e-06  7.377e-01   0.000  1.00000
## Trtmt_Dayresist                        -7.944e-01  1.576e+00  -0.504  0.61415
## protegens                              -4.320e+00  1.550e+00  -2.788  0.00531
## community_expected_mu                  -3.113e+00  1.006e+00  -3.094  0.00197
## Heatcontrol:Trtmt_Dayrecov_2            1.705e-01  7.536e-01   0.226  0.82104
## Heatcontrol:Trtmt_Dayresist             8.643e-01  1.583e+00   0.546  0.58513
## Heatcontrol:protegens                  -4.065e+00  6.496e-01  -6.259 3.89e-10
## Trtmt_Dayrecov_2:protegens              1.907e-01  8.204e-01   0.232  0.81619
## Trtmt_Dayresist:protegens               4.327e-01  1.663e+00   0.260  0.79464
## protegens:community_expected_mu         4.751e+00  1.570e+00   3.026  0.00248
## Heatcontrol:Trtmt_Dayrecov_2:protegens  2.528e-01  9.000e-01   0.281  0.77882
## Heatcontrol:Trtmt_Dayresist:protegens   1.329e-01  1.703e+00   0.078  0.93780
##                                           
## (Intercept)                            ** 
## CommRich                               ***
## Heatcontrol                            ***
## Trtmt_Dayrecov_2                          
## Trtmt_Dayresist                           
## protegens                              ** 
## community_expected_mu                  ** 
## Heatcontrol:Trtmt_Dayrecov_2              
## Heatcontrol:Trtmt_Dayresist               
## Heatcontrol:protegens                  ***
## Trtmt_Dayrecov_2:protegens                
## Trtmt_Dayresist:protegens                 
## protegens:community_expected_mu        ** 
## Heatcontrol:Trtmt_Dayrecov_2:protegens    
## Heatcontrol:Trtmt_Dayresist:protegens     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# print a summary table of the model fits
data.frame(pars = sapply(div_mods48h, npar_of_glmmTMB_fit),
           AIC = sapply(div_mods48h, AIC),
           AICc = sapply(div_mods48h, AICc),
           BIC = sapply(div_mods48h, BIC)) %>%
                    mutate(dAIC = min(AIC)-AIC,
                           dAICc = min(AICc)-AICc,
                           dBIC = min(BIC)-BIC) %>% arrange(BIC)
# print some of the top model predictions
print(plot_model_pred.CommRich(mod_list=div_mods48h, mod_name="*prot"))

print(plot_model_pred.MU(mod_list=div_mods48h, mod_name="*prot + prot*mu"))

print(plot_model_pred.MU(mod_list=div_mods48h, mod_name="*prot +mu"))

####################
# Select the best model across all data subsets
####################
# for each information criterion, get the average across all data subsets
meanIC <- data.frame(pars = sapply(div_mods48h, npar_of_glmmTMB_fit),
                     AIC = sapply(div_mods6h, AIC) + sapply(div_mods12h, AIC) + sapply(div_mods24h, AIC) + sapply(div_mods48h, AIC),
                     AICc = sapply(div_mods6h, AICc) + sapply(div_mods12h, AICc) + sapply(div_mods24h, AICc) + sapply(div_mods48h, AICc),
                     BIC = sapply(div_mods6h, BIC) + sapply(div_mods12h, BIC) + sapply(div_mods24h, BIC) + sapply(div_mods48h, BIC)) %>%
            mutate(AIC = AIC/4,
                   AICc = AICc/4,
                   BIC = BIC/4) %>%
              mutate(dAIC = min(AIC)-AIC,
                     dAICc = min(AICc)-AICc,
                     dBIC = min(BIC)-BIC)
meanIC %>% arrange(BIC)
# clean up
rm(meanIC)

####################
# get the emmeans
####################
# use emmeans to get the effect size during heat as compared to control for each of the treatment days AND conditional on protegens
emm_6h <- emmeans(div_mods6h[["*prot + prot*mu"]],
                  ~ Heat | CommRich + Trtmt_Day*protegens + community_expected_mu*protegens,
                  data = diversity_6h, type = "response")
effect_6h <- eff_size(emm_6h, sigma(div_mods6h[["*prot + prot*mu"]]), edf = df.residual(div_mods6h[["*prot + prot*mu"]]))

emm_12h <- emmeans(div_mods12h[["*prot + prot*mu"]],
                   ~ Heat | CommRich + Trtmt_Day*protegens + community_expected_mu*protegens,
                   data = diversity_12h, type = "response")
effect_12h <- eff_size(emm_12h, sigma(div_mods12h[["*prot + prot*mu"]]), edf = df.residual(div_mods12h[["*prot + prot*mu"]]))

emm_24h <- emmeans(div_mods24h[["*prot + prot*mu"]],
                   ~ Heat | CommRich + Trtmt_Day*protegens + community_expected_mu*protegens,
                   data = diversity_24h, type = "response")
effect_24h <- eff_size(emm_24h, sigma(div_mods24h[["*prot + prot*mu"]]), edf = df.residual(div_mods24h[["*prot + prot*mu"]]))

emm_48h <- emmeans(div_mods48h[["*prot + prot*mu"]],
                   ~ Heat | CommRich + Trtmt_Day*protegens + community_expected_mu*protegens,
                   data = diversity_48h, type = "response")
effect_48h <- eff_size(emm_48h, sigma(div_mods48h[["*prot + prot*mu"]]), edf = df.residual(div_mods48h[["*prot + prot*mu"]]))

# create a data.frame for plotting marginal effect sizes using a forest plot with the group labels
div_effects_protegens <- data.frame()
div_effects_protegens <- rbind(div_effects_protegens,
                              get_effsize_CIs(effect_6h, heat_trtmt = 6),
                              get_effsize_CIs(effect_12h, heat_trtmt = 12),
                              get_effsize_CIs(effect_24h, heat_trtmt = 24),
                              get_effsize_CIs(effect_48h, heat_trtmt = 48))
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
div_effects_protegens$Trtmt_Day <- factor(div_effects_protegens$Trtmt_Day,
                                          levels = c("resist", "recov_1", "recov_2"))
levels(div_effects_protegens$Trtmt_Day) <- c("Resistance", "Early Recovery", "LAte Recovery")

# plot conditional part of the model
ggplot(div_effects_protegens,
       aes(x = est, y = as.factor(Heat), colour = Trtmt_Day, shape = as.logical(protegens))) +
  geom_vline(xintercept = 0, colour="darkgrey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbarh(position = position_dodge(width = 0.5),
                 aes(xmin = loCI, xmax = hiCI), height = 0.1) +
  scale_colour_manual(values=trtmt_pal) +
  labs(title = "Extinct reps removed!",
       x = "Effect Size on Shannon Diversity",
       shape = "protegens\npresent?",
       y="Heat duration")

# anyway we can still average over the effect of protegens
# we can do a posthoc on this to illustrate statistically significant effects
posthoc_6h <- emmeans(effect_6h, pairwise ~ Trtmt_Day, data = diversity_6h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_12h <- emmeans(effect_12h, pairwise ~ Trtmt_Day, data = diversity_12h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_24h <- emmeans(effect_24h, pairwise ~ Trtmt_Day, data = diversity_24h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_48h <- emmeans(effect_48h, pairwise ~ Trtmt_Day, data = diversity_48h)
## NOTE: Results may be misleading due to involvement in interactions
# create a data.frame for plotting
div_effects <- data.frame()
div_effects <- rbind(div_effects,
                     get_posthoc(posthoc_6h, heat_trtmt = 6),
                     get_posthoc(posthoc_12h, heat_trtmt = 12),
                     get_posthoc(posthoc_24h, heat_trtmt = 24),
                     get_posthoc(posthoc_48h, heat_trtmt = 48))
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
div_effects$Trtmt_Day <- factor(div_effects$Trtmt_Day,
                                         levels = c("resist", "recov_1", "recov_2"))
levels(div_effects$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")

# plot with group labels
ggplot(div_effects,
       aes(x = est, y = as.factor(Heat), colour = Trtmt_Day)) +
  geom_vline(xintercept = 0, colour="darkgrey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbarh(position = position_dodge(width = 0.5),
                 aes(xmin = loCI, xmax = hiCI), height = 0.1) +
  geom_text(position = position_dodge(width = 0.5),
            aes(x=-2.5, label=groups)) +
  scale_colour_manual(values=trtmt_pal) +
  labs(x = "Effect Size on Shannon Diversity",
       y="Heat duration",
       title = "protegens as non-focal predictor. Extinct reps removed!")

################################
# Plot figure for main text: Figure 4a
################################
fig4a <- ggplot(div_effects,
                aes(x = est, y = as.factor(Heat), colour = Trtmt_Day)) +
          geom_vline(xintercept = 0, colour="darkgrey") +
          geom_point(position = position_dodge(width = 0.5)) +
          geom_errorbarh(position = position_dodge(width = 0.5),
                         aes(xmin = loCI, xmax = hiCI), height = 0.15) +
          scale_x_continuous(limits = c(-4.35, 0.9), expand = c(0,0)) +
          scale_colour_manual(values=trtmt_pal) +
          labs(x = "Effect Size on Shannon Diversity",
               y="Heat Duration (hrs)",
               colour = "Treatment\nDay")

print(fig4a)

png(filename="./figures/Fig4_legend.png", width = 3.40, height = 2.90, units = "in", res=300)
print(fig4a)
dev.off()
## png 
##   2
png(filename="./figures/Fig4A_plot.png", width = 4.48, height = 2.61, units = "in", res=300)
print(fig4a + theme(legend.position="none"))
dev.off()
## png 
##   2
# cleanup
#rm(diversity_forFit, diversity_6h, diversity_12h, diversity_24h, diversity_48h,
#   div_mods6h, div_mods12h, div_mods24h, div_mods48h,
#   emm_6h, emm_12h, emm_24h, emm_48h, effect_6h, effect_12h, effect_24h, effect_48h,
#   posthoc_6h, posthoc_12h, posthoc_24h, posthoc_48h,
#   get_effsize_CIs, get_posthoc, 
#   div_effects_protegens, div_effects,
#   plot_model_pred.CommRich, plot_model_pred.MU, fig4a)

The overall results are the same as we found above. This shows that excluding the replicates that went extinct had no impact on the overall results.

Productivity

How is total community density impacted during and after heat? Let’s first plot it directly to get an idea of what we’re dealing with:

ggplot(absDen_forFit %>% filter(!is.na(Total_density)), # remove NA values
       aes(y=Total_density, x=Day, fill=community_expected_mu)) +
  facet_grid(protegens~as.factor(Heat)) +
  geom_quasirandom(alpha=0.7, pch=21) +
  scale_fill_viridis_c(option = "inferno", end=0.85) +
  #scale_y_log10(#) +
  scale_y_continuous(trans=scales::pseudo_log_trans(base = 10), # this prevents 0's from getting lost
                     breaks = 10^(-1:3)) + 
  labs(y = "Productivity",
       fill = "Expected\nCommunity mu")

Choose GLM Family

Now we repeat the same type of emmeans analysis as we did for diversity but using the total density (aka a proxy of productivity). In this case I am a priori more comfortable with using Poisson or negative binomial family because the total density is more like counts data.

Remember that total densities below the threshold of detection from wells that DID recover during the recovery phase (i.e., those that did not go extinct) have values of epsilon corresponding to the threshold of detection. (Remaining NA values represent missing data due to pipetting mistakes or clogs during flow cytometry.) Below threshold of detection total density values (i.e., epsilons) make up the majority of observations during resistance for the longest heat duration. See a further discussion in the section below.

# scale the data by its standard deviation
absDen_forFit$TotDensity_scale <- scale(absDen_forFit$Total_density,
                                        scale = sd(absDen_forFit$Total_density, na.rm = TRUE),
                                        center = FALSE)
# the max scaled value is ~7.9 and almost 3% of the data is 0 values
summary(absDen_forFit$TotDensity_scale)
##        V1        
##  Min.   :0.0000  
##  1st Qu.:0.1041  
##  Median :0.2425  
##  Mean   :0.6746  
##  3rd Qu.:0.8045  
##  Max.   :7.8953  
##  NA's   :9
sum(absDen_forFit$TotDensity_scale == 0) / length(absDen_forFit$TotDensity_scale)
## [1] NA
# in fact, the total density data is even more long-tailed than the diversity data. I guess that makes sense as there is a max value for the possible diversity with 4 species.
hist(absDen_forFit$TotDensity_scale)

# re-arrange the levels so that emmeans can be run:
absDen_forFit$Heat <- as.character(absDen_forFit$Heat)
absDen_forFit$Heat[which(absDen_forFit$Heat == 0)] <- "control"
# !!! emmeans expects the control to be the very *last* level !!!
absDen_forFit$Heat <- factor(absDen_forFit$Heat,
                             levels = c("6", "12", "24", "48", "control"))
# let's keep CommRich and Day as numeric for now while we look for the best fitting GLM family

# let's compare different GLM families
try_gaussian <- glmmTMB(TotDensity_scale ~ CommRich*Heat*Day*protegens,
                        data = absDen_forFit,
                        control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_gaussian, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.936 0.828 0.464 0.604 0.104 0.448 0.092 0.328 0.448 0.5 0.616 0.668 0.692 0.144 0.468 0.484 0.828 0.54 0.136 1 ...
try_gamma <- glmmTMB(TotDensity_scale ~ CommRich*Heat*Day*protegens,
                     data = absDen_forFit,
                     family = ziGamma,
                     ziformula = ~1, # this needs to be added because there are 0 values in the data
                     control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
simulateResiduals(fittedModel = try_gamma, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.812 0.964 0.5 0.748 0.336 0.532 0.336 0.612 0.46 0.596 0.64 0.78 0.816 0.368 0.508 0.636 0.748 0.664 0.376 0.952 ...
try_lognorm <- glmmTMB(TotDensity_scale ~ CommRich*Heat*Day*protegens,
                       data = absDen_forFit,
                       family = lognormal,
                       ziformula = ~1, # this needs to be added because there are 0 values in the data
                       control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
simulateResiduals(fittedModel = try_lognorm, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.836 0.876 0.356 0.612 0.26 0.384 0.176 0.484 0.32 0.436 0.668 0.596 0.708 0.164 0.372 0.472 0.796 0.468 0.272 0.964 ...
try_LOGlognorm <- glmmTMB(log(TotDensity_scale + 1) ~ CommRich*Heat*Day*protegens,
                          data = absDen_forFit,
                          family = lognormal,
                          ziformula = ~1, # I'm keeping this as 0-inflated lognormal alone was already over-dispersed. So I want to see if the log(x+1) transformation sufficiently brings in the long tail.
                          control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
simulateResiduals(fittedModel = try_LOGlognorm, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.776 0.912 0.356 0.644 0.26 0.396 0.196 0.48 0.32 0.448 0.624 0.644 0.74 0.192 0.384 0.48 0.776 0.492 0.252 0.928 ...
try_negbinom <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ CommRich*Heat*Day*protegens,
                        data = absDen_forFit,
                        family = nbinom2,
                        control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_negbinom, plot = TRUE)
## DHARMa:testOutliers with type = binomial may have inflated Type I error rates for integer-valued distributions. To get a more exact result, it is recommended to re-run testOutliers with type = 'bootstrap'. See ?testOutliers for details

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.776 0.944 0.536 0.784 0.404 0.612 0.456 0.548 0.5558496 0.6970012 0.6674832 0.84 0.848 0.444 0.54 0.6242595 0.748 0.6612216 0.408 0.892 ...
try_negbinom0 <- glmmTMB(as.integer(Total_density * 1000) ~ CommRich*Heat*Day*protegens,
                         data = absDen_forFit,
                         family = nbinom2,
                         ziformula = ~1, # try zero inflated distribution
                         control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_negbinom0, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.752 0.952 0.508 0.76 0.344 0.568 0.364 0.516 0.504 0.676 0.644 0.876 0.872 0.384 0.564 0.64 0.732 0.736 0.404 0.956 ...
try_poisson <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ CommRich*Heat*Day*protegens,
                       data = absDen_forFit,
                       family = genpois,
                       control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_poisson, plot = TRUE)
## DHARMa:testOutliers with type = binomial may have inflated Type I error rates for integer-valued distributions. To get a more exact result, it is recommended to re-run testOutliers with type = 'bootstrap'. See ?testOutliers for details

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.788 0.804 0.3376638 0.612 0.244 0.416 0.144 0.392 0.324256 0.496 0.592 0.668 0.6574457 0.188 0.384 0.436 0.756 0.508 0.2 0.924 ...
try_poisson0 <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ CommRich*Heat*Day*protegens,
                        data = absDen_forFit,
                        family = genpois,
                        ziformula = ~1, # try zero inflated distribution
                        control = glmmTMBControl(optCtrl = list(iter.max = 10000,eval.max = 10000)))
simulateResiduals(fittedModel = try_poisson0, plot = TRUE)
## DHARMa:testOutliers with type = binomial may have inflated Type I error rates for integer-valued distributions. To get a more exact result, it is recommended to re-run testOutliers with type = 'bootstrap'. See ?testOutliers for details
## qu = 0.25, log(sigma) = -2.001813 : outer Newton did not converge fully.

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.784 0.824 0.384 0.636 0.188 0.3772009 0.232 0.464 0.364 0.4753274 0.612 0.68 0.7 0.208 0.356 0.436 0.748 0.5187042 0.208 0.952 ...
# let's check this with AIC and BIC
AIC(try_gaussian, try_gamma, try_lognorm, try_LOGlognorm,
    try_negbinom, try_negbinom0, try_poisson, try_poisson0) %>% arrange(AIC)
BIC(try_gaussian, try_gamma, try_lognorm, try_LOGlognorm,
    try_negbinom, try_negbinom0, try_poisson, try_poisson0) %>% arrange(BIC)
# clean up
rm(try_gaussian, try_gamma, try_lognorm, try_LOGlognorm, try_negbinom, try_negbinom0, try_poisson, try_poisson0)

Okay, so let’s go for the Poisson family. Its residuals look a little worse than the log(x+1) transformed lognormal… But I feel really sketched out by the latter model. Whereas the Poisson is the type of family that I might expect to see for count-style data like the Total density.

Compare fit of different models to data subsets

Now we will do the same thing we did for diversity: split up the data into subsets by heat pulse duration, calculate the best fit information criteria (i.e., AIC and BIC) for each data subset, and get the average across the entire data.

# a function to fit the different models to the subsetted data:
fit_productivity_models <- function(data_subset) {
  # create list for output
  output.ls <- list()
  
  # this is the simplest model. I'm fitting it to check for colinearity
  output.ls[["simple"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ CommRich + Heat + Trtmt_Day + protegens + community_expected_mu,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  # this is another simple model to check for colinearity
  output.ls[["simple resist"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ CommRich + Heat + Trtmt_Day + community_expected_mu + resistant,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  # this is our null model:
  output.ls[["H0"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  # CommRich as an effect:
  output.ls[["+CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day + CommRich,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
    output.ls[["*CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  # Resistance to 40C as an effect:
  output.ls[["+resist"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day + resistant,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
    output.ls[["*resist"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*resistant,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
  # protegens presence as an effect:
  output.ls[["+prot"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day + protegens,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
    output.ls[["*prot"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
    # expected community growth rate as an effect:
    output.ls[["+mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day + community_expected_mu,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
  
    output.ls[["*mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    # interactions of CommRich with resistance
    output.ls[["+CommRich +resist"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day + CommRich + resistant,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*CommRich +resist"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich + resistant,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*CommRich*resist"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich*resistant,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))

    output.ls[["*CommRich + CommRich*resist"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich + CommRich*resistant,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*resist +CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*resistant + CommRich,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*resist + resist*CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*resistant + resistant*CommRich,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    # interactions of CommRich with protegens
    output.ls[["+CommRich +prot"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day + CommRich + protegens,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*CommRich*prot"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich*protegens,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*CommRich +prot"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich + protegens,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*CommRich + CommRich*prot"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich + CommRich*protegens,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*prot +CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens + CommRich,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*prot + prot*CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens + protegens*CommRich,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    # interactions of CommRich with expected community growth rate
    output.ls[["+CommRich +mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day + CommRich + community_expected_mu,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*CommRich*mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich*community_expected_mu,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*CommRich +mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich + community_expected_mu,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*CommRich + CommRich*mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich + CommRich*community_expected_mu,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*mu +CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu + CommRich,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*mu + mu*CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu + community_expected_mu*CommRich,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    # interactions of resistance with expected community growth rate
    output.ls[["+resist +mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day + resistant + community_expected_mu,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*resist*mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*resistant*community_expected_mu,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*resist +mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*resistant + community_expected_mu,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*resist + resist*mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*resistant + resistant*community_expected_mu,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*mu +resist"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu + resistant,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*mu + mu*resist"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu + community_expected_mu*resistant,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    # interactions of protegens with expected community growth rate
    output.ls[["+prot +mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day + protegens + community_expected_mu,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*prot*mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens*community_expected_mu,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*prot +mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens + community_expected_mu,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*prot + prot*mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens + protegens*community_expected_mu,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*mu +prot"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu + protegens,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*mu + mu*prot"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu + community_expected_mu*protegens,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    # I need to consider models with even more predictors:
    # e.g., with CommRich, mu, and resist 
    output.ls[["*prot +mu +CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens + community_expected_mu + CommRich,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*prot*mu +CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens*community_expected_mu + CommRich,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*prot + mu*CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens + community_expected_mu*CommRich,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*prot + prot*mu +CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens + protegens*community_expected_mu + CommRich,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*prot +mu + prot*CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens + community_expected_mu + protegens*CommRich,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*prot*CommRich +mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*protegens*CommRich + community_expected_mu,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*mu +prot +CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu + protegens + CommRich,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*mu + mu*prot +CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu + community_expected_mu*protegens + CommRich,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*mu +prot + mu*CommRich"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu + protegens + community_expected_mu*CommRich,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*mu*CommRich +prot"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*community_expected_mu*CommRich + protegens,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))
    
    output.ls[["*CommRich +prot +mu"]] <- glmmTMB(as.integer(TotDensity_scale * 1000) ~ Heat*Trtmt_Day*CommRich + protegens + community_expected_mu,
                               data = data_subset,
                               family = genpois,
                               control = glmmTMBControl(optCtrl = list(iter.max = 500000,eval.max = 500000)))

  return(output.ls)
}

# a function to plot model predictions against the data
  # there's no colours. Just the facets for heat & protegens
plot_model_pred.nocolours <- function(mod_list, mod_name){
  # create data.frame for plotting
  absDen_predict <- cbind(mod_list[[mod_name]]$frame,
                          predicted=predict(mod_list[[mod_name]], type="response"))
  # change the first column name for easier plotting
  colnames(absDen_predict)[1] <- "observed"
  # create the plot
  out <- ggplot(absDen_predict, 
                aes(x=Trtmt_Day, y=observed)) +
         facet_grid(protegens ~ Heat) +
         geom_jitter(alpha=0.4) +
         geom_line(aes(y=predicted, group=paste(Heat, protegens)), colour="red") +
         scale_y_log10() +
         scale_colour_viridis_d(option = "viridis", begin=0.1, end=0.85) +
         labs(y="Absolute Density (rescaled)",
              title=paste(mod_name, "model predictions"))
  return(out)
  rm(absDen_predict)
}

# a function to plot model predictions against the data
  # INOCULATED COMMUNITY RICHNESS is plotted as different colours
plot_model_pred.CommRich <- function(mod_list, mod_name){
  # create data.frame for plotting
  absDen_predict <- cbind(mod_list[[mod_name]]$frame,
                          predicted=predict(mod_list[[mod_name]], type="response"))
  # change the first column name for easier plotting
  colnames(absDen_predict)[1] <- "observed"
  # create the plot
  out <- ggplot(absDen_predict, 
                aes(x=Trtmt_Day, y=observed, colour=as.factor(CommRich))) +
         facet_grid(protegens ~ Heat) +
         geom_jitter(alpha=0.4) +
         geom_line(aes(y=predicted, group=as.factor(CommRich))) +
         scale_y_log10() +
         scale_colour_viridis_d(option = "viridis", begin=0.1, end=0.85) +
         labs(y="Absolute Density (rescaled)",
              colour="CommRich",
              title=paste(mod_name, "model predictions"))
  return(out)
  rm(absDen_predict)
}

# a function to plot model predictions against the data
  # EXPECTED COMMUNITY MU is plotted as different colours
plot_model_pred.MU <- function(mod_list, mod_name){
  # create data.frame for plotting
  absDen_predict <- cbind(mod_list[[mod_name]]$frame,
                          predicted=predict(mod_list[[mod_name]], type="response"))
  # change the first column name for easier plotting
  colnames(absDen_predict)[1] <- "observed"
  # create the plot
  out <- ggplot(absDen_predict, 
                aes(x=Trtmt_Day, y=observed,
                    colour=community_expected_mu, group=as.factor(community_expected_mu))) +
         facet_grid(protegens ~ Heat) +
         geom_jitter(alpha=0.4) +
         geom_line(aes(y=predicted)) +
         scale_y_log10() +
         scale_colour_viridis_c(option = "inferno", end=0.85) +
         labs(y="Absolute Density (rescaled)",
              colour="Expected\nCommunity mu",
              title=paste(mod_name, "model predictions"))
  return(out)
  rm(absDen_predict)
}

####################
# 6h heat duration
####################
# grab just the treatment with its associated control data
absDen_6h <- rbind(absDen_forFit %>% filter(Heat == "6"),
                   absDen_forFit %>% filter(Heat == "control", Day < 4))
# create a column for last day of heat, first day of recovery, and last day of recovery
absDen_6h$Trtmt_Day <- "resist"
absDen_6h$Trtmt_Day[absDen_6h$Day == 2] <- "recov_1"
absDen_6h$Trtmt_Day[absDen_6h$Day == 3] <- "recov_2"
# appropriately distinguish between numbers and factors
absDen_6h$Trtmt_Day <- as.factor(absDen_6h$Trtmt_Day)
absDen_6h$Heat <- droplevels(absDen_6h$Heat)
absDen_6h$resistant <- as.factor(absDen_6h$resistant)
absDen_6h$protegens <- as.factor(absDen_6h$protegens)

# fit different models:
absDen_mods6h <- fit_productivity_models(absDen_6h)

# check the simplest possible models for multicolinearity
check_collinearity(absDen_mods6h[["simple"]])
check_collinearity(absDen_mods6h[["simple resist"]])
# print a summary table of the model fits
data.frame(pars = sapply(absDen_mods6h, npar_of_glmmTMB_fit),
                           AIC = sapply(absDen_mods6h, AIC),
                           AICc = sapply(absDen_mods6h, AICc),
                           BIC = sapply(absDen_mods6h, BIC)) %>%
                    mutate(dAIC = min(AIC)-AIC,
                           dAICc = min(AICc)-AICc,
                           dBIC = min(BIC)-BIC) %>% arrange(BIC)
# plot the best model for 6h:
print(plot_model_pred.MU(mod_list=absDen_mods6h, mod_name="*mu +prot"))

# plot the best model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods6h, mod_name="*prot*mu +CommRich"))

# plot the preferred model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods6h, mod_name="*prot +mu + prot*CommRich"))

# check the fit and estimates of the best model for the complete data:
simulateResiduals(fittedModel = absDen_mods6h[["*prot*mu +CommRich"]], plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.624 0.96 0.724 0.6761063 0.992 0.872 0.584 0.02 0.916 0.4455416 0.964 1 0.752 0.4 0.864 0.4146115 0.112 0.5655875 0.5192903 0.572 ...
summary(absDen_mods6h[["*prot*mu +CommRich"]])
##  Family: genpois  ( log )
## Formula:          
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens *  
##     community_expected_mu + CommRich
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##   5633.3   5736.5  -2790.6   5581.3      366 
## 
## 
## Dispersion parameter for genpois family ():  304 
## 
## Conditional model:
##                                                               Estimate
## (Intercept)                                                    9.15663
## Heatcontrol                                                   -2.69889
## Trtmt_Dayrecov_2                                              -0.53159
## Trtmt_Dayresist                                               -3.32573
## protegens1                                                    -2.26618
## community_expected_mu                                         -2.09080
## CommRich                                                       0.02518
## Heatcontrol:Trtmt_Dayrecov_2                                   2.46173
## Heatcontrol:Trtmt_Dayresist                                    4.55108
## Heatcontrol:protegens1                                         0.09441
## Trtmt_Dayrecov_2:protegens1                                   -0.37024
## Trtmt_Dayresist:protegens1                                     3.81032
## Heatcontrol:community_expected_mu                              2.94727
## Trtmt_Dayrecov_2:community_expected_mu                         0.58031
## Trtmt_Dayresist:community_expected_mu                          3.86389
## protegens1:community_expected_mu                               1.15169
## Heatcontrol:Trtmt_Dayrecov_2:protegens1                       -0.71601
## Heatcontrol:Trtmt_Dayresist:protegens1                        -3.77165
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu            -3.05173
## Heatcontrol:Trtmt_Dayresist:community_expected_mu             -5.13012
## Heatcontrol:protegens1:community_expected_mu                  -0.31375
## Trtmt_Dayrecov_2:protegens1:community_expected_mu              0.02075
## Trtmt_Dayresist:protegens1:community_expected_mu              -4.33957
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu  1.02566
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu   4.73448
##                                                               Std. Error
## (Intercept)                                                      0.40488
## Heatcontrol                                                      0.59547
## Trtmt_Dayrecov_2                                                 0.56176
## Trtmt_Dayresist                                                  0.57510
## protegens1                                                       1.00410
## community_expected_mu                                            0.48536
## CommRich                                                         0.03105
## Heatcontrol:Trtmt_Dayrecov_2                                     0.96092
## Heatcontrol:Trtmt_Dayresist                                      0.90474
## Heatcontrol:protegens1                                           1.51133
## Trtmt_Dayrecov_2:protegens1                                      1.48404
## Trtmt_Dayresist:protegens1                                       1.43502
## Heatcontrol:community_expected_mu                                0.68689
## Trtmt_Dayrecov_2:community_expected_mu                           0.66397
## Trtmt_Dayresist:community_expected_mu                            0.65607
## protegens1:community_expected_mu                                 1.12505
## Heatcontrol:Trtmt_Dayrecov_2:protegens1                          2.30808
## Heatcontrol:Trtmt_Dayresist:protegens1                           2.13203
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu               1.11975
## Heatcontrol:Trtmt_Dayresist:community_expected_mu                1.03026
## Heatcontrol:protegens1:community_expected_mu                     1.67526
## Trtmt_Dayrecov_2:protegens1:community_expected_mu                1.66028
## Trtmt_Dayresist:protegens1:community_expected_mu                 1.60061
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu    2.56853
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu     2.36475
##                                                               z value Pr(>|z|)
## (Intercept)                                                    22.616  < 2e-16
## Heatcontrol                                                    -4.532 5.83e-06
## Trtmt_Dayrecov_2                                               -0.946  0.34400
## Trtmt_Dayresist                                                -5.783 7.34e-09
## protegens1                                                     -2.257  0.02401
## community_expected_mu                                          -4.308 1.65e-05
## CommRich                                                        0.811  0.41732
## Heatcontrol:Trtmt_Dayrecov_2                                    2.562  0.01041
## Heatcontrol:Trtmt_Dayresist                                     5.030 4.90e-07
## Heatcontrol:protegens1                                          0.062  0.95019
## Trtmt_Dayrecov_2:protegens1                                    -0.249  0.80299
## Trtmt_Dayresist:protegens1                                      2.655  0.00793
## Heatcontrol:community_expected_mu                               4.291 1.78e-05
## Trtmt_Dayrecov_2:community_expected_mu                          0.874  0.38212
## Trtmt_Dayresist:community_expected_mu                           5.889 3.88e-09
## protegens1:community_expected_mu                                1.024  0.30599
## Heatcontrol:Trtmt_Dayrecov_2:protegens1                        -0.310  0.75640
## Heatcontrol:Trtmt_Dayresist:protegens1                         -1.769  0.07689
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu             -2.725  0.00642
## Heatcontrol:Trtmt_Dayresist:community_expected_mu              -4.979 6.38e-07
## Heatcontrol:protegens1:community_expected_mu                   -0.187  0.85144
## Trtmt_Dayrecov_2:protegens1:community_expected_mu               0.012  0.99003
## Trtmt_Dayresist:protegens1:community_expected_mu               -2.711  0.00670
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu   0.399  0.68966
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu    2.002  0.04527
##                                                                  
## (Intercept)                                                   ***
## Heatcontrol                                                   ***
## Trtmt_Dayrecov_2                                                 
## Trtmt_Dayresist                                               ***
## protegens1                                                    *  
## community_expected_mu                                         ***
## CommRich                                                         
## Heatcontrol:Trtmt_Dayrecov_2                                  *  
## Heatcontrol:Trtmt_Dayresist                                   ***
## Heatcontrol:protegens1                                           
## Trtmt_Dayrecov_2:protegens1                                      
## Trtmt_Dayresist:protegens1                                    ** 
## Heatcontrol:community_expected_mu                             ***
## Trtmt_Dayrecov_2:community_expected_mu                           
## Trtmt_Dayresist:community_expected_mu                         ***
## protegens1:community_expected_mu                                 
## Heatcontrol:Trtmt_Dayrecov_2:protegens1                          
## Heatcontrol:Trtmt_Dayresist:protegens1                        .  
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu            ** 
## Heatcontrol:Trtmt_Dayresist:community_expected_mu             ***
## Heatcontrol:protegens1:community_expected_mu                     
## Trtmt_Dayrecov_2:protegens1:community_expected_mu                
## Trtmt_Dayresist:protegens1:community_expected_mu              ** 
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu    
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu  *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# check the fit and estimates of the preferred model for the complete data:
simulateResiduals(fittedModel = absDen_mods6h[["*prot*mu +CommRich"]], plot = TRUE)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.624 0.96 0.724 0.6761063 0.992 0.872 0.584 0.02 0.916 0.4455416 0.964 1 0.752 0.4 0.864 0.4146115 0.112 0.5655875 0.5192903 0.572 ...
summary(absDen_mods6h[["*prot +mu + prot*CommRich"]])
##  Family: genpois  ( log )
## Formula:          
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens +  
##     community_expected_mu + protegens * CommRich
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##   5662.5   5726.0  -2815.2   5630.5      376 
## 
## 
## Dispersion parameter for genpois family ():  368 
## 
## Conditional model:
##                                          Estimate Std. Error z value Pr(>|z|)
## (Intercept)                              7.839618   0.228850   34.26  < 2e-16
## Heatcontrol                             -0.180861   0.118842   -1.52   0.1280
## Trtmt_Dayrecov_2                        -0.018307   0.097866   -0.19   0.8516
## Trtmt_Dayresist                         -0.008151   0.101418   -0.08   0.9359
## protegens1                              -1.304007   0.181747   -7.17 7.24e-13
## community_expected_mu                   -0.491750   0.214502   -2.29   0.0219
## CommRich                                -0.003971   0.044970   -0.09   0.9296
## Heatcontrol:Trtmt_Dayrecov_2            -0.137528   0.167598   -0.82   0.4119
## Heatcontrol:Trtmt_Dayresist              0.174712   0.164347    1.06   0.2878
## Heatcontrol:protegens1                  -0.061370   0.180549   -0.34   0.7339
## Trtmt_Dayrecov_2:protegens1             -0.333568   0.165252   -2.02   0.0435
## Trtmt_Dayresist:protegens1               0.061134   0.164067    0.37   0.7094
## protegens1:CommRich                      0.032276   0.064557    0.50   0.6171
## Heatcontrol:Trtmt_Dayrecov_2:protegens1  0.075049   0.261004    0.29   0.7737
## Heatcontrol:Trtmt_Dayresist:protegens1   0.251918   0.250033    1.01   0.3137
##                                            
## (Intercept)                             ***
## Heatcontrol                                
## Trtmt_Dayrecov_2                           
## Trtmt_Dayresist                            
## protegens1                              ***
## community_expected_mu                   *  
## CommRich                                   
## Heatcontrol:Trtmt_Dayrecov_2               
## Heatcontrol:Trtmt_Dayresist                
## Heatcontrol:protegens1                     
## Trtmt_Dayrecov_2:protegens1             *  
## Trtmt_Dayresist:protegens1                 
## protegens1:CommRich                        
## Heatcontrol:Trtmt_Dayrecov_2:protegens1    
## Heatcontrol:Trtmt_Dayresist:protegens1     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################
# 12h heat duration
####################
# grab just the treatment with its associated control data
absDen_12h <- rbind(absDen_forFit %>% filter(Heat == "12", Day > 1),
                       absDen_forFit %>% filter(Heat == "control", Day > 1, Day != 5))
# create a column for last day of heat, first day of recovery, and last day of recovery
absDen_12h$Trtmt_Day <- "resist"
absDen_12h$Trtmt_Day[absDen_12h$Day == 3] <- "recov_1"
absDen_12h$Trtmt_Day[absDen_12h$Day == 4] <- "recov_2"
# appropriately distinguish between numbers and factors
absDen_12h$Trtmt_Day <- as.factor(absDen_12h$Trtmt_Day)
absDen_12h$Heat <- droplevels(absDen_12h$Heat)
absDen_12h$resistant <- as.factor(absDen_12h$resistant)
absDen_12h$protegens <- as.factor(absDen_12h$protegens)

# fit different models:
absDen_mods12h <- fit_productivity_models(absDen_12h)

# check the simplest possible models for multicolinearity
check_collinearity(absDen_mods12h[["simple"]])
check_collinearity(absDen_mods12h[["simple resist"]])
# print a summary table of the model fits
data.frame(pars = sapply(absDen_mods12h, npar_of_glmmTMB_fit),
                           AIC = sapply(absDen_mods12h, AIC),
                           AICc = sapply(absDen_mods12h, AICc),
                           BIC = sapply(absDen_mods12h, BIC)) %>%
                    mutate(dAIC = min(AIC)-AIC,
                           dAICc = min(AICc)-AICc,
                           dBIC = min(BIC)-BIC) %>% arrange(BIC)
# plot the best model for 12h:
print(plot_model_pred.MU(mod_list=absDen_mods12h, mod_name="*mu +prot"))

# plot the best model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods12h, mod_name="*prot*mu +CommRich"))

# plot the preferred model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods12h, mod_name="*prot +mu + prot*CommRich"))

# check the fit and estimates of the best model for the complete data:
simulateResiduals(fittedModel = absDen_mods12h[["*prot*mu +CommRich"]], plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.876 0.632 0.2287389 0.686251 0.2893171 0.468 0.08077386 0.628 0.82 0.42 0.672 0.708 0.532 0.4667235 0.568 0.584 0.4163186 0.3632073 0.5082269 0.92 ...
summary(absDen_mods12h[["*prot*mu +CommRich"]])
##  Family: genpois  ( log )
## Formula:          
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens *  
##     community_expected_mu + CommRich
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##   4909.8   5010.3  -2428.9   4857.8      327 
## 
## 
## Dispersion parameter for genpois family ():  367 
## 
## Conditional model:
##                                                               Estimate
## (Intercept)                                                    8.09791
## Heatcontrol                                                    0.25448
## Trtmt_Dayrecov_2                                               0.71753
## Trtmt_Dayresist                                               -4.82578
## protegens1                                                    -0.74091
## community_expected_mu                                         -1.10117
## CommRich                                                       0.05516
## Heatcontrol:Trtmt_Dayrecov_2                                  -1.20770
## Heatcontrol:Trtmt_Dayresist                                    2.87963
## Heatcontrol:protegens1                                        -2.54842
## Trtmt_Dayrecov_2:protegens1                                   -3.98416
## Trtmt_Dayresist:protegens1                                     3.37405
## Heatcontrol:community_expected_mu                             -0.50272
## Trtmt_Dayrecov_2:community_expected_mu                        -1.27493
## Trtmt_Dayresist:community_expected_mu                          4.86917
## protegens1:community_expected_mu                              -0.74334
## Heatcontrol:Trtmt_Dayrecov_2:protegens1                        4.54307
## Heatcontrol:Trtmt_Dayresist:protegens1                        -2.22955
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu             1.98282
## Heatcontrol:Trtmt_Dayresist:community_expected_mu             -2.37210
## Heatcontrol:protegens1:community_expected_mu                   2.68624
## Trtmt_Dayrecov_2:protegens1:community_expected_mu              4.67950
## Trtmt_Dayresist:protegens1:community_expected_mu              -3.24580
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu -5.28624
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu   2.11439
##                                                               Std. Error
## (Intercept)                                                      0.66174
## Heatcontrol                                                      0.95852
## Trtmt_Dayrecov_2                                                 1.00213
## Trtmt_Dayresist                                                  0.87125
## protegens1                                                       1.26176
## community_expected_mu                                            0.79076
## CommRich                                                         0.04032
## Heatcontrol:Trtmt_Dayrecov_2                                     1.35549
## Heatcontrol:Trtmt_Dayresist                                      1.20260
## Heatcontrol:protegens1                                           1.87852
## Trtmt_Dayrecov_2:protegens1                                      1.90107
## Trtmt_Dayresist:protegens1                                       1.71937
## Heatcontrol:community_expected_mu                                1.13250
## Trtmt_Dayrecov_2:community_expected_mu                           1.20788
## Trtmt_Dayresist:community_expected_mu                            1.00848
## protegens1:community_expected_mu                                 1.43013
## Heatcontrol:Trtmt_Dayrecov_2:protegens1                          2.78351
## Heatcontrol:Trtmt_Dayresist:protegens1                           2.50706
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu               1.60793
## Heatcontrol:Trtmt_Dayresist:community_expected_mu                1.38954
## Heatcontrol:protegens1:community_expected_mu                     2.11453
## Trtmt_Dayrecov_2:protegens1:community_expected_mu                2.15578
## Trtmt_Dayresist:protegens1:community_expected_mu                 1.92965
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu    3.13427
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu     2.79618
##                                                               z value Pr(>|z|)
## (Intercept)                                                    12.237  < 2e-16
## Heatcontrol                                                     0.265   0.7906
## Trtmt_Dayrecov_2                                                0.716   0.4740
## Trtmt_Dayresist                                                -5.539 3.04e-08
## protegens1                                                     -0.587   0.5571
## community_expected_mu                                          -1.393   0.1638
## CommRich                                                        1.368   0.1713
## Heatcontrol:Trtmt_Dayrecov_2                                   -0.891   0.3729
## Heatcontrol:Trtmt_Dayresist                                     2.394   0.0166
## Heatcontrol:protegens1                                         -1.357   0.1749
## Trtmt_Dayrecov_2:protegens1                                    -2.096   0.0361
## Trtmt_Dayresist:protegens1                                      1.962   0.0497
## Heatcontrol:community_expected_mu                              -0.444   0.6571
## Trtmt_Dayrecov_2:community_expected_mu                         -1.056   0.2912
## Trtmt_Dayresist:community_expected_mu                           4.828 1.38e-06
## protegens1:community_expected_mu                               -0.520   0.6032
## Heatcontrol:Trtmt_Dayrecov_2:protegens1                         1.632   0.1027
## Heatcontrol:Trtmt_Dayresist:protegens1                         -0.889   0.3738
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu              1.233   0.2175
## Heatcontrol:Trtmt_Dayresist:community_expected_mu              -1.707   0.0878
## Heatcontrol:protegens1:community_expected_mu                    1.270   0.2040
## Trtmt_Dayrecov_2:protegens1:community_expected_mu               2.171   0.0300
## Trtmt_Dayresist:protegens1:community_expected_mu               -1.682   0.0926
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu  -1.687   0.0917
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu    0.756   0.4495
##                                                                  
## (Intercept)                                                   ***
## Heatcontrol                                                      
## Trtmt_Dayrecov_2                                                 
## Trtmt_Dayresist                                               ***
## protegens1                                                       
## community_expected_mu                                            
## CommRich                                                         
## Heatcontrol:Trtmt_Dayrecov_2                                     
## Heatcontrol:Trtmt_Dayresist                                   *  
## Heatcontrol:protegens1                                           
## Trtmt_Dayrecov_2:protegens1                                   *  
## Trtmt_Dayresist:protegens1                                    *  
## Heatcontrol:community_expected_mu                                
## Trtmt_Dayrecov_2:community_expected_mu                           
## Trtmt_Dayresist:community_expected_mu                         ***
## protegens1:community_expected_mu                                 
## Heatcontrol:Trtmt_Dayrecov_2:protegens1                          
## Heatcontrol:Trtmt_Dayresist:protegens1                           
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu               
## Heatcontrol:Trtmt_Dayresist:community_expected_mu             .  
## Heatcontrol:protegens1:community_expected_mu                     
## Trtmt_Dayrecov_2:protegens1:community_expected_mu             *  
## Trtmt_Dayresist:protegens1:community_expected_mu              .  
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu .  
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# check the fit and estimates of the preferred model for the complete data:
simulateResiduals(fittedModel = absDen_mods12h[["*prot +mu + prot*CommRich"]], plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.928 0.884 0.2 0.628 0.2543694 0.488 0 0.598251 0.904 0.4106342 0.672 0.548 0.52 0.4607739 0.572 0.6 0.1 0.3307235 0.852 0.964 ...
simulateResiduals(fittedModel = absDen_mods12h[["*prot*mu +CommRich"]], plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.876 0.632 0.2287389 0.686251 0.2893171 0.468 0.08077386 0.628 0.82 0.42 0.672 0.708 0.532 0.4667235 0.568 0.584 0.4163186 0.3632073 0.5082269 0.92 ...
summary(absDen_mods12h[["*prot +mu + prot*CommRich"]])
##  Family: genpois  ( log )
## Formula:          
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens +  
##     community_expected_mu + protegens * CommRich
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##   4935.9   4997.8  -2452.0   4903.9      337 
## 
## 
## Dispersion parameter for genpois family ():  446 
## 
## Conditional model:
##                                         Estimate Std. Error z value Pr(>|z|)
## (Intercept)                              7.01699    0.32692  21.464  < 2e-16
## Heatcontrol                             -0.21263    0.13850  -1.535   0.1247
## Trtmt_Dayrecov_2                        -0.37577    0.14055  -2.674   0.0075
## Trtmt_Dayresist                         -0.86993    0.16953  -5.131 2.88e-07
## protegens1                              -1.25584    0.23759  -5.286 1.25e-07
## community_expected_mu                    0.07319    0.29296   0.250   0.8027
## CommRich                                 0.11143    0.07060   1.578   0.1145
## Heatcontrol:Trtmt_Dayrecov_2             0.51493    0.20295   2.537   0.0112
## Heatcontrol:Trtmt_Dayresist              1.11031    0.21161   5.247 1.55e-07
## Heatcontrol:protegens1                  -0.09314    0.20466  -0.455   0.6490
## Trtmt_Dayrecov_2:protegens1              0.18835    0.20302   0.928   0.3535
## Trtmt_Dayresist:protegens1               0.89448    0.21920   4.081 4.49e-05
## protegens1:CommRich                     -0.08730    0.08660  -1.008   0.3134
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 -0.17894    0.29796  -0.601   0.5481
## Heatcontrol:Trtmt_Dayresist:protegens1  -0.71900    0.29616  -2.428   0.0152
##                                            
## (Intercept)                             ***
## Heatcontrol                                
## Trtmt_Dayrecov_2                        ** 
## Trtmt_Dayresist                         ***
## protegens1                              ***
## community_expected_mu                      
## CommRich                                   
## Heatcontrol:Trtmt_Dayrecov_2            *  
## Heatcontrol:Trtmt_Dayresist             ***
## Heatcontrol:protegens1                     
## Trtmt_Dayrecov_2:protegens1                
## Trtmt_Dayresist:protegens1              ***
## protegens1:CommRich                        
## Heatcontrol:Trtmt_Dayrecov_2:protegens1    
## Heatcontrol:Trtmt_Dayresist:protegens1  *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################
# 24h heat duration
####################
# grab just the treatment with its associated control data
absDen_24h <- rbind(absDen_forFit %>% filter(Heat == "24", Day > 1),
                       absDen_forFit %>% filter(Heat == "control", Day > 1, Day != 5))
# create a column for last day of heat, first day of recovery, and last day of recovery
absDen_24h$Trtmt_Day <- "resist"
absDen_24h$Trtmt_Day[absDen_24h$Day == 3] <- "recov_1"
absDen_24h$Trtmt_Day[absDen_24h$Day == 4] <- "recov_2"
# appropriately distinguish between numbers and factors
absDen_24h$Trtmt_Day <- as.factor(absDen_24h$Trtmt_Day)
absDen_24h$Heat <- droplevels(absDen_24h$Heat)
absDen_24h$resistant <- as.factor(absDen_24h$resistant)
absDen_24h$protegens <- as.factor(absDen_24h$protegens)

# fit different models:
absDen_mods24h <- fit_productivity_models(absDen_24h)

# check the simplest possible models for multicolinearity
check_collinearity(absDen_mods24h[["simple"]])
check_collinearity(absDen_mods24h[["simple resist"]])
# print a summary table of the model fits
data.frame(pars = sapply(absDen_mods24h, npar_of_glmmTMB_fit),
                           AIC = sapply(absDen_mods24h, AIC),
                           AICc = sapply(absDen_mods24h, AICc),
                           BIC = sapply(absDen_mods24h, BIC)) %>%
                    mutate(dAIC = min(AIC)-AIC,
                           dAICc = min(AICc)-AICc,
                           dBIC = min(BIC)-BIC) %>% arrange(BIC)
# plot a top model that is unique for 24h:
  # note that this is the 3rd best model for 24h:
print(plot_model_pred.CommRich(mod_list=absDen_mods24h, mod_name="*prot*CommRich +mu"))
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.

# plot the best model for the complete data:
  # note that this is ALSO the 1st best model for 24h:
print(plot_model_pred.MU(mod_list=absDen_mods24h, mod_name="*prot*mu +CommRich"))
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.

# plot the preferred model for the complete data:
  # note that this is ALSO the 2nd best model for 24h:
print(plot_model_pred.MU(mod_list=absDen_mods24h, mod_name="*prot +mu + prot*CommRich"))
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.

# check the fit and estimates of the best model for the complete data:
simulateResiduals(fittedModel = absDen_mods24h[["*prot*mu +CommRich"]], plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.4549449 0.446523 0.544 0.2972669 0.8025741 0.6789442 0.612 0.4374037 0.2729079 0.3030947 0.664 0.592 0.303566 0.1624146 0.14 0.1910651 0.2 0.14 0.3627356 0.3717779 ...
summary(absDen_mods24h[["*prot*mu +CommRich"]])
##  Family: genpois  ( log )
## Formula:          
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens *  
##     community_expected_mu + CommRich
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##   4966.6   5067.2  -2457.3   4914.6      328 
## 
## 
## Dispersion parameter for genpois family ():  898 
## 
## Conditional model:
##                                                                Estimate
## (Intercept)                                                    -1.06385
## Heatcontrol                                                     9.28614
## Trtmt_Dayrecov_2                                                3.74130
## Trtmt_Dayresist                                                 6.35728
## protegens1                                                      7.06537
## community_expected_mu                                           7.83934
## CommRich                                                        0.17510
## Heatcontrol:Trtmt_Dayrecov_2                                   -4.12166
## Heatcontrol:Trtmt_Dayresist                                    -8.30158
## Heatcontrol:protegens1                                        -10.42663
## Trtmt_Dayrecov_2:protegens1                                    -4.73014
## Trtmt_Dayresist:protegens1                                     -4.97428
## Heatcontrol:community_expected_mu                              -9.33776
## Trtmt_Dayrecov_2:community_expected_mu                         -3.14872
## Trtmt_Dayresist:community_expected_mu                          -9.08213
## protegens1:community_expected_mu                               -7.86491
## Heatcontrol:Trtmt_Dayrecov_2:protegens1                         5.25834
## Heatcontrol:Trtmt_Dayresist:protegens1                          6.27755
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu              3.74696
## Heatcontrol:Trtmt_Dayresist:community_expected_mu              11.59928
## Heatcontrol:protegens1:community_expected_mu                    9.98283
## Trtmt_Dayrecov_2:protegens1:community_expected_mu               3.77422
## Trtmt_Dayresist:protegens1:community_expected_mu                7.04546
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu  -4.35323
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu   -8.42221
##                                                               Std. Error
## (Intercept)                                                      0.61438
## Heatcontrol                                                      1.09770
## Trtmt_Dayrecov_2                                                 0.78556
## Trtmt_Dayresist                                                  1.21326
## protegens1                                                       1.33830
## community_expected_mu                                            0.58303
## CommRich                                                         0.04775
## Heatcontrol:Trtmt_Dayrecov_2                                     1.44037
## Heatcontrol:Trtmt_Dayresist                                      1.62354
## Heatcontrol:protegens1                                           2.07681
## Trtmt_Dayrecov_2:protegens1                                      1.91362
## Trtmt_Dayresist:protegens1                                       2.16799
## Heatcontrol:community_expected_mu                                1.21212
## Trtmt_Dayrecov_2:community_expected_mu                           0.75005
## Trtmt_Dayresist:community_expected_mu                            1.29511
## protegens1:community_expected_mu                                 1.42804
## Heatcontrol:Trtmt_Dayrecov_2:protegens1                          2.99065
## Heatcontrol:Trtmt_Dayresist:protegens1                           3.00853
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu               1.58514
## Heatcontrol:Trtmt_Dayresist:community_expected_mu                1.79943
## Heatcontrol:protegens1:community_expected_mu                     2.28848
## Trtmt_Dayrecov_2:protegens1:community_expected_mu                2.06167
## Trtmt_Dayresist:protegens1:community_expected_mu                 2.36909
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu    3.30114
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu     3.31968
##                                                               z value Pr(>|z|)
## (Intercept)                                                    -1.732 0.083348
## Heatcontrol                                                     8.460  < 2e-16
## Trtmt_Dayrecov_2                                                4.763 1.91e-06
## Trtmt_Dayresist                                                 5.240 1.61e-07
## protegens1                                                      5.279 1.30e-07
## community_expected_mu                                          13.446  < 2e-16
## CommRich                                                        3.667 0.000246
## Heatcontrol:Trtmt_Dayrecov_2                                   -2.862 0.004216
## Heatcontrol:Trtmt_Dayresist                                    -5.113 3.17e-07
## Heatcontrol:protegens1                                         -5.020 5.15e-07
## Trtmt_Dayrecov_2:protegens1                                    -2.472 0.013443
## Trtmt_Dayresist:protegens1                                     -2.294 0.021767
## Heatcontrol:community_expected_mu                              -7.704 1.32e-14
## Trtmt_Dayrecov_2:community_expected_mu                         -4.198 2.69e-05
## Trtmt_Dayresist:community_expected_mu                          -7.013 2.34e-12
## protegens1:community_expected_mu                               -5.507 3.64e-08
## Heatcontrol:Trtmt_Dayrecov_2:protegens1                         1.758 0.078703
## Heatcontrol:Trtmt_Dayresist:protegens1                          2.087 0.036925
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu              2.364 0.018088
## Heatcontrol:Trtmt_Dayresist:community_expected_mu               6.446 1.15e-10
## Heatcontrol:protegens1:community_expected_mu                    4.362 1.29e-05
## Trtmt_Dayrecov_2:protegens1:community_expected_mu               1.831 0.067151
## Trtmt_Dayresist:protegens1:community_expected_mu                2.974 0.002940
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu  -1.319 0.187268
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu   -2.537 0.011179
##                                                                  
## (Intercept)                                                   .  
## Heatcontrol                                                   ***
## Trtmt_Dayrecov_2                                              ***
## Trtmt_Dayresist                                               ***
## protegens1                                                    ***
## community_expected_mu                                         ***
## CommRich                                                      ***
## Heatcontrol:Trtmt_Dayrecov_2                                  ** 
## Heatcontrol:Trtmt_Dayresist                                   ***
## Heatcontrol:protegens1                                        ***
## Trtmt_Dayrecov_2:protegens1                                   *  
## Trtmt_Dayresist:protegens1                                    *  
## Heatcontrol:community_expected_mu                             ***
## Trtmt_Dayrecov_2:community_expected_mu                        ***
## Trtmt_Dayresist:community_expected_mu                         ***
## protegens1:community_expected_mu                              ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens1                       .  
## Heatcontrol:Trtmt_Dayresist:protegens1                        *  
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu            *  
## Heatcontrol:Trtmt_Dayresist:community_expected_mu             ***
## Heatcontrol:protegens1:community_expected_mu                  ***
## Trtmt_Dayrecov_2:protegens1:community_expected_mu             .  
## Trtmt_Dayresist:protegens1:community_expected_mu              ** 
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu    
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu  *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# check the fit and estimates of the preferred model for the complete data:
simulateResiduals(fittedModel = absDen_mods24h[["*prot +mu + prot*CommRich"]], plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.5304174 0.3788716 0.2106584 0.8585741 0.728 0.5731777 0.5213509 0.12 0.2060813 0.3513732 0.728 0.768 0.6823773 0.1184146 0.192 0.1202988 0.552 0.316 0.068 0.2980919 ...
summary(absDen_mods24h[["*prot +mu + prot*CommRich"]])
##  Family: genpois  ( log )
## Formula:          
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens +  
##     community_expected_mu + protegens * CommRich
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##   5032.2   5094.1  -2500.1   5000.2      338 
## 
## 
## Dispersion parameter for genpois family (): 1.13e+03 
## 
## Conditional model:
##                                         Estimate Std. Error z value Pr(>|z|)
## (Intercept)                              2.63860    0.34784   7.586 3.31e-14
## Heatcontrol                              0.46860    0.25896   1.810 0.070365
## Trtmt_Dayrecov_2                         1.04463    0.21234   4.920 8.67e-07
## Trtmt_Dayresist                         -2.14902    0.28138  -7.637 2.22e-14
## protegens1                               1.13299    0.26296   4.309 1.64e-05
## community_expected_mu                    2.78569    0.37170   7.494 6.66e-14
## CommRich                                 0.73825    0.07891   9.356  < 2e-16
## Heatcontrol:Trtmt_Dayrecov_2            -0.80677    0.28113  -2.870 0.004108
## Heatcontrol:Trtmt_Dayresist              2.69517    0.34109   7.902 2.75e-15
## Heatcontrol:protegens1                  -0.99125    0.30543  -3.245 0.001173
## Trtmt_Dayrecov_2:protegens1             -1.43914    0.26652  -5.400 6.67e-08
## Trtmt_Dayresist:protegens1               1.64247    0.32041   5.126 2.96e-07
## protegens1:CommRich                     -0.67710    0.09426  -7.183 6.82e-13
## Heatcontrol:Trtmt_Dayrecov_2:protegens1  1.38150    0.36498   3.785 0.000154
## Heatcontrol:Trtmt_Dayresist:protegens1  -1.77160    0.40483  -4.376 1.21e-05
##                                            
## (Intercept)                             ***
## Heatcontrol                             .  
## Trtmt_Dayrecov_2                        ***
## Trtmt_Dayresist                         ***
## protegens1                              ***
## community_expected_mu                   ***
## CommRich                                ***
## Heatcontrol:Trtmt_Dayrecov_2            ** 
## Heatcontrol:Trtmt_Dayresist             ***
## Heatcontrol:protegens1                  ** 
## Trtmt_Dayrecov_2:protegens1             ***
## Trtmt_Dayresist:protegens1              ***
## protegens1:CommRich                     ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 ***
## Heatcontrol:Trtmt_Dayresist:protegens1  ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################
# 48h heat duration
####################
# grab just the treatment with its associated control data
absDen_48h <- rbind(absDen_forFit %>% filter(Heat == "48", Day > 2),
                       absDen_forFit %>% filter(Heat == "control", Day > 2))
# create a column for last day of heat, first day of recovery, and last day of recovery
absDen_48h$Trtmt_Day <- "resist"
absDen_48h$Trtmt_Day[absDen_48h$Day == 4] <- "recov_1"
absDen_48h$Trtmt_Day[absDen_48h$Day == 5] <- "recov_2"
# appropriately distinguish between numbers and factors
absDen_48h$Trtmt_Day <- as.factor(absDen_48h$Trtmt_Day)
absDen_48h$Heat <- droplevels(absDen_48h$Heat)
absDen_48h$resistant <- as.factor(absDen_48h$resistant)
absDen_48h$protegens <- as.factor(absDen_48h$protegens)

# fit different models:
absDen_mods48h <- fit_productivity_models(absDen_48h)

# check the simplest possible models for multicolinearity
check_collinearity(absDen_mods48h[["simple"]])
check_collinearity(absDen_mods48h[["simple resist"]])
# print a summary table of the model fits
data.frame(pars = sapply(absDen_mods48h, npar_of_glmmTMB_fit),
                           AIC = sapply(absDen_mods48h, AIC),
                           AICc = sapply(absDen_mods48h, AICc),
                           BIC = sapply(absDen_mods48h, BIC)) %>%
                    mutate(dAIC = min(AIC)-AIC,
                           dAICc = min(AICc)-AICc,
                           dBIC = min(BIC)-BIC) %>% arrange(BIC)
# plot the best model for 48h:
print(plot_model_pred.CommRich(mod_list=absDen_mods48h, mod_name="*prot +CommRich"))
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.

# plot the best model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods48h, mod_name="*prot*mu +CommRich"))
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.

# plot the preferred model for the complete data:
print(plot_model_pred.CommRich(mod_list=absDen_mods48h, mod_name="*prot +mu + prot*CommRich"))
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.

# check the fit and estimates of the best model for the complete data:
simulateResiduals(fittedModel = absDen_mods48h[["*prot*mu +CommRich"]], plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.6730612 0.6791969 0.7916869 0.6795164 0.6937438 0.7163217 0.788161 0.6591218 0.6644682 0.6291745 0.2536879 0.6056463 0.5591551 0.3618208 0.2980939 0.2284036 0.3769242 0.2488986 0.2111782 0.7354215 ...
summary(absDen_mods48h[["*prot*mu +CommRich"]])
##  Family: genpois  ( log )
## Formula:          
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens *  
##     community_expected_mu + CommRich
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##   3750.0   3847.9  -1849.0   3698.0      293 
## 
## 
## Dispersion parameter for genpois family ():  555 
## 
## Conditional model:
##                                                                 Estimate
## (Intercept)                                                    -4.291133
## Heatcontrol                                                    12.213336
## Trtmt_Dayrecov_2                                                0.002528
## Trtmt_Dayresist                                                 8.305382
## protegens1                                                     10.632949
## community_expected_mu                                           8.001702
## CommRich                                                        0.035916
## Heatcontrol:Trtmt_Dayrecov_2                                    0.809306
## Heatcontrol:Trtmt_Dayresist                                    -7.854578
## Heatcontrol:protegens1                                        -13.173287
## Trtmt_Dayrecov_2:protegens1                                    -0.476146
## Trtmt_Dayresist:protegens1                                    -11.203126
## Heatcontrol:community_expected_mu                              -8.837146
## Trtmt_Dayrecov_2:community_expected_mu                         -0.004589
## Trtmt_Dayresist:community_expected_mu                          -9.397868
## protegens1:community_expected_mu                               -8.575331
## Heatcontrol:Trtmt_Dayrecov_2:protegens1                         0.061785
## Heatcontrol:Trtmt_Dayresist:protegens1                         10.691756
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu             -1.998287
## Heatcontrol:Trtmt_Dayresist:community_expected_mu               8.748105
## Heatcontrol:protegens1:community_expected_mu                    9.795295
## Trtmt_Dayrecov_2:protegens1:community_expected_mu               0.293979
## Trtmt_Dayresist:protegens1:community_expected_mu                8.938939
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu   1.008883
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu   -8.388933
##                                                               Std. Error
## (Intercept)                                                     1.696541
## Heatcontrol                                                     1.810265
## Trtmt_Dayrecov_2                                                2.379266
## Trtmt_Dayresist                                                 6.583168
## protegens1                                                      2.047178
## community_expected_mu                                           1.646305
## CommRich                                                        0.048699
## Heatcontrol:Trtmt_Dayrecov_2                                    2.669089
## Heatcontrol:Trtmt_Dayresist                                     6.656851
## Heatcontrol:protegens1                                          2.561747
## Trtmt_Dayrecov_2:protegens1                                     2.918422
## Trtmt_Dayresist:protegens1                                      7.311226
## Heatcontrol:community_expected_mu                               1.809675
## Trtmt_Dayrecov_2:community_expected_mu                          2.322114
## Trtmt_Dayresist:community_expected_mu                           6.417814
## protegens1:community_expected_mu                                2.084017
## Heatcontrol:Trtmt_Dayrecov_2:protegens1                         3.671418
## Heatcontrol:Trtmt_Dayresist:protegens1                          7.610139
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu              2.716498
## Heatcontrol:Trtmt_Dayresist:community_expected_mu               6.519094
## Heatcontrol:protegens1:community_expected_mu                    2.705234
## Trtmt_Dayrecov_2:protegens1:community_expected_mu               2.975798
## Trtmt_Dayresist:protegens1:community_expected_mu                7.311158
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu   3.891699
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu    7.683676
##                                                               z value Pr(>|z|)
## (Intercept)                                                    -2.529 0.011428
## Heatcontrol                                                     6.747 1.51e-11
## Trtmt_Dayrecov_2                                                0.001 0.999152
## Trtmt_Dayresist                                                 1.262 0.207090
## protegens1                                                      5.194 2.06e-07
## community_expected_mu                                           4.860 1.17e-06
## CommRich                                                        0.738 0.460809
## Heatcontrol:Trtmt_Dayrecov_2                                    0.303 0.761727
## Heatcontrol:Trtmt_Dayresist                                    -1.180 0.238030
## Heatcontrol:protegens1                                         -5.142 2.71e-07
## Trtmt_Dayrecov_2:protegens1                                    -0.163 0.870399
## Trtmt_Dayresist:protegens1                                     -1.532 0.125444
## Heatcontrol:community_expected_mu                              -4.883 1.04e-06
## Trtmt_Dayrecov_2:community_expected_mu                         -0.002 0.998423
## Trtmt_Dayresist:community_expected_mu                          -1.464 0.143101
## protegens1:community_expected_mu                               -4.115 3.88e-05
## Heatcontrol:Trtmt_Dayrecov_2:protegens1                         0.017 0.986573
## Heatcontrol:Trtmt_Dayresist:protegens1                          1.405 0.160040
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu             -0.736 0.461967
## Heatcontrol:Trtmt_Dayresist:community_expected_mu               1.342 0.179622
## Heatcontrol:protegens1:community_expected_mu                    3.621 0.000294
## Trtmt_Dayrecov_2:protegens1:community_expected_mu               0.099 0.921305
## Trtmt_Dayresist:protegens1:community_expected_mu                1.223 0.221464
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu   0.259 0.795450
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu   -1.092 0.274927
##                                                                  
## (Intercept)                                                   *  
## Heatcontrol                                                   ***
## Trtmt_Dayrecov_2                                                 
## Trtmt_Dayresist                                                  
## protegens1                                                    ***
## community_expected_mu                                         ***
## CommRich                                                         
## Heatcontrol:Trtmt_Dayrecov_2                                     
## Heatcontrol:Trtmt_Dayresist                                      
## Heatcontrol:protegens1                                        ***
## Trtmt_Dayrecov_2:protegens1                                      
## Trtmt_Dayresist:protegens1                                       
## Heatcontrol:community_expected_mu                             ***
## Trtmt_Dayrecov_2:community_expected_mu                           
## Trtmt_Dayresist:community_expected_mu                            
## protegens1:community_expected_mu                              ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens1                          
## Heatcontrol:Trtmt_Dayresist:protegens1                           
## Heatcontrol:Trtmt_Dayrecov_2:community_expected_mu               
## Heatcontrol:Trtmt_Dayresist:community_expected_mu                
## Heatcontrol:protegens1:community_expected_mu                  ***
## Trtmt_Dayrecov_2:protegens1:community_expected_mu                
## Trtmt_Dayresist:protegens1:community_expected_mu                 
## Heatcontrol:Trtmt_Dayrecov_2:protegens1:community_expected_mu    
## Heatcontrol:Trtmt_Dayresist:protegens1:community_expected_mu     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# check the fit and estimates of the preferred model for the complete data:
simulateResiduals(fittedModel = absDen_mods48h[["*prot +mu + prot*CommRich"]], plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.6436734 0.6454696 0.7505564 0.6581071 0.6482629 0.7228183 0.8126409 0.6537034 0.6654285 0.6474962 0.2318472 0.5936928 0.4862218 0.3618208 0.291319 0.2314288 0.4074107 0.2597992 0.2229924 0.7390945 ...
summary(absDen_mods48h[["*prot +mu + prot*CommRich"]])
##  Family: genpois  ( log )
## Formula:          
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens +  
##     community_expected_mu + protegens * CommRich
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##   3788.0   3848.2  -1878.0   3756.0      303 
## 
## 
## Dispersion parameter for genpois family ():  621 
## 
## Conditional model:
##                                           Estimate Std. Error z value Pr(>|z|)
## (Intercept)                              2.8331268  0.4532965   6.250  4.1e-10
## Heatcontrol                              4.3834846  0.3165532  13.848  < 2e-16
## Trtmt_Dayrecov_2                        -0.0003534  0.4081133  -0.001 0.999309
## Trtmt_Dayresist                         -0.1724414  0.7613290  -0.227 0.820812
## protegens1                               3.1809478  0.3757864   8.465  < 2e-16
## community_expected_mu                   -0.1011856  0.3844470  -0.263 0.792398
## CommRich                                 0.0851054  0.0880786   0.966 0.333922
## Heatcontrol:Trtmt_Dayrecov_2            -0.9651338  0.4435357  -2.176 0.029555
## Heatcontrol:Trtmt_Dayresist              0.0453817  0.7768166   0.058 0.953414
## Heatcontrol:protegens1                  -4.4852090  0.3581642 -12.523  < 2e-16
## Trtmt_Dayrecov_2:protegens1             -0.2098648  0.4375742  -0.480 0.631505
## Trtmt_Dayresist:protegens1              -3.1328542  0.8215695  -3.813 0.000137
## protegens1:CommRich                     -0.0721257  0.1048607  -0.688 0.491563
## Heatcontrol:Trtmt_Dayrecov_2:protegens1  0.9523355  0.4997122   1.906 0.056680
## Heatcontrol:Trtmt_Dayresist:protegens1   3.1195962  0.8529193   3.658 0.000255
##                                            
## (Intercept)                             ***
## Heatcontrol                             ***
## Trtmt_Dayrecov_2                           
## Trtmt_Dayresist                            
## protegens1                              ***
## community_expected_mu                      
## CommRich                                   
## Heatcontrol:Trtmt_Dayrecov_2            *  
## Heatcontrol:Trtmt_Dayresist                
## Heatcontrol:protegens1                  ***
## Trtmt_Dayrecov_2:protegens1                
## Trtmt_Dayresist:protegens1              ***
## protegens1:CommRich                        
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 .  
## Heatcontrol:Trtmt_Dayresist:protegens1  ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################
# Select the best model across all data subsets
####################
# for each information criterion, get the average across all data subsets
meanIC <- data.frame(pars = sapply(absDen_mods48h, npar_of_glmmTMB_fit),
                     AIC = sapply(absDen_mods6h, AIC) + sapply(absDen_mods12h, AIC) + sapply(absDen_mods24h, AIC) + sapply(absDen_mods48h, AIC),
                     AICc = sapply(absDen_mods6h, AICc) + sapply(absDen_mods12h, AICc) + sapply(absDen_mods24h, AICc) + sapply(absDen_mods48h, AICc),
                     BIC = sapply(absDen_mods6h, BIC) + sapply(absDen_mods12h, BIC) + sapply(absDen_mods24h, BIC) + sapply(absDen_mods48h, BIC)) %>%
            mutate(AIC = AIC/4,
                   AICc = AICc/4,
                   BIC = BIC/4) %>%
              mutate(dAIC = min(AIC)-AIC,
                     dAICc = min(AICc)-AICc,
                     dBIC = min(BIC)-BIC)
meanIC %>% arrange(BIC)
meanIC %>% arrange(AIC)
# clean up
rm(meanIC)

Both AIC and BIC agree that the best model is “*prot*mu +CommRich”. However, this model is rather complex. I’m less concerned with the danger of over-fitting as each data subset has between 360 - 393 observations (so we are still within the rule-of-thumb of 10 to 15 observations per parameter). But I am very concerned about the number of interaction in this complex model. Recall that its full formula is ~ Heat*Trtmt_Day*protegens*community_expected_mu + CommRich. So it has a 4-way interaction!!! YIKES!

I really don’t want to use a model with this many interactions because this usually leads to poor estimates (which cannot be fixed with a posthoc analysis). Or, at the very least it leads to parameter estimates that are very hard to interpret (see above).

I feel confident that the complex model “*prot*mu +CommRich” is quite close to the “Truth” because the short heat pulse data prefer the interaction Heat*Trtmt_Day*community_expected_mu while the long heat pulse data prefer the interaction Heat*Trtmt_Day*protegens. So I think this complex model is the only one that’s able to accommodate both of these effects across all heat pulse durations. The problem is that it’s too complex to be interpretable.

Therefore, I will use the BIC as the criteria for selecting a model that still captures the main effects but is much simpler. The model “*prot +mu + prot*CommRich” has a negligible \(\Delta\)BIC (recall that the rule of thumb for BIC model selection criteria is that \(\Delta\)BIC \(<2\) is not worth mentioning). (The \(\Delta\)AIC is substantially different between this simpler model and the best one but in general AIC is favouring all the most complex models anyway (which is expected as AIC penalizes less for model complexity). Here we really are much more interested in BIC because we need to simplify our model.)

I think I should modify the structure of the analysis as follows:

Effect sizes for complex model

Check whether the most complex model is working okay here,

emmeans(absDen_mods48h[["*prot*mu +CommRich"]],
                   ~ Heat | CommRich + Trtmt_Day*protegens*community_expected_mu,
                   data = absDen_48h, type = "response")
## CommRich = 2.15, Trtmt_Day = recov_1, protegens = 0, community_expected_mu = 0.893:
##  Heat    response     SE  df asymp.LCL asymp.UCL
##  48          18.8   6.63 Inf      9.40      37.5
##  control   1412.9 166.00 Inf   1122.45    1778.5
## 
## CommRich = 2.15, Trtmt_Day = recov_2, protegens = 0, community_expected_mu = 0.893:
##  Heat    response     SE  df asymp.LCL asymp.UCL
##  48          18.7   6.62 Inf      9.38      37.5
##  control    531.9  79.80 Inf    396.32     713.9
## 
## CommRich = 2.15, Trtmt_Day = resist, protegens = 0, community_expected_mu = 0.893:
##  Heat    response     SE  df asymp.LCL asymp.UCL
##  48          17.2  18.10 Inf      2.18     135.8
##  control   1241.3 147.00 Inf    984.02    1565.8
## 
## CommRich = 2.15, Trtmt_Day = recov_1, protegens = 1, community_expected_mu = 0.893:
##  Heat    response     SE  df asymp.LCL asymp.UCL
##  48         367.5  47.10 Inf    285.84     472.6
##  control    331.2  45.10 Inf    253.62     432.4
## 
## CommRich = 2.15, Trtmt_Day = recov_2, protegens = 1, community_expected_mu = 0.893:
##  Heat    response     SE  df asymp.LCL asymp.UCL
##  48         296.4  39.20 Inf    228.68     384.1
##  control    263.7  35.90 Inf    201.95     344.4
## 
## CommRich = 2.15, Trtmt_Day = resist, protegens = 1, community_expected_mu = 0.893:
##  Heat    response     SE  df asymp.LCL asymp.UCL
##  48          13.5   4.04 Inf      7.47      24.2
##  control    285.1  38.00 Inf    219.59     370.2
## 
## Unknown transformation "as.integer": no transformation done 
## Confidence level used: 0.95

Let’s check whether the effect sizes of the complex model are consistent with those of the simpler model.

# plot the effect size contingent on protegens
effect_6h_protegens <- eff_size(emmeans(absDen_mods6h[["*prot*mu +CommRich"]], ~ Heat | CommRich + Trtmt_Day*protegens*community_expected_mu, data = absDen_6h),
                                sigma(absDen_mods6h[["*prot*mu +CommRich"]]),
                                edf = df.residual(absDen_mods6h[["*prot*mu +CommRich"]]))
effect_12h_protegens <- eff_size(emmeans(absDen_mods12h[["*prot*mu +CommRich"]], ~ Heat | CommRich + Trtmt_Day*protegens*community_expected_mu, data = absDen_12h),
                                sigma(absDen_mods12h[["*prot*mu +CommRich"]]),
                                edf = df.residual(absDen_mods12h[["*prot*mu +CommRich"]]))
effect_24h_protegens <- eff_size(emmeans(absDen_mods24h[["*prot*mu +CommRich"]], ~ Heat | CommRich + Trtmt_Day*protegens*community_expected_mu, data = absDen_24h),
                                sigma(absDen_mods24h[["*prot*mu +CommRich"]]),
                                edf = df.residual(absDen_mods24h[["*prot*mu +CommRich"]]))
effect_48h_protegens <- eff_size(emmeans(absDen_mods48h[["*prot*mu +CommRich"]], ~ Heat | CommRich + Trtmt_Day*protegens*community_expected_mu, data = absDen_48h),
                                sigma(absDen_mods48h[["*prot*mu +CommRich"]]),
                                edf = df.residual(absDen_mods48h[["*prot*mu +CommRich"]]))


# a function that extracts the confidence intervals from eff_size contingent on protegens
get_effsize_CIs <- function(eff_size_object, heat_trtmt) {
  data.frame(Heat = heat_trtmt,
             CommRich = confint(eff_size_object)[[2]],
             Trtmt_Day = confint(eff_size_object)[[3]],
             protegens = confint(eff_size_object)[[4]],
             community_expected_mu = confint(eff_size_object)[[5]],
             effect_est = confint(eff_size_object)[[6]], #[[5]],
             effect_loCI = confint(eff_size_object)[[9]], #[[8]],
             effect_hiCI = confint(eff_size_object)[[10]]) #[[9]])
}

# create a data.frame for plotting marginal effect sizes using a forest plot
productivity_protegens <- data.frame()
productivity_protegens <- rbind(productivity_protegens,
                              get_effsize_CIs(effect_6h_protegens, heat_trtmt = 6),
                              get_effsize_CIs(effect_12h_protegens, heat_trtmt = 12),
                              get_effsize_CIs(effect_24h_protegens, heat_trtmt = 24),
                              get_effsize_CIs(effect_48h_protegens, heat_trtmt = 48))
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
productivity_protegens$Trtmt_Day <- factor(productivity_protegens$Trtmt_Day,
                                         levels = c("resist", "recov_1", "recov_2"))
levels(productivity_protegens$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")

#plot
ggplot(productivity_protegens,
       aes(x = effect_est, y = as.factor(Heat), colour = Trtmt_Day, shape = protegens)) +
  geom_vline(xintercept = 0, colour="darkgrey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbarh(position = position_dodge(width = 0.5),
                 aes(xmin = effect_loCI, xmax = effect_hiCI), height = 0.1) +
  scale_colour_manual(values=trtmt_pal) +
  labs(x = "Effect Size on Total Density",
       y = "Heat duration (hrs)",
       shape = "protegens\npresent?",
       title = "*prot*mu +CommRich")

# But we are not interested in the details of protegens. Let's do the post-hoc by averaging across the effects of protegens.

posthoc_6h <- emmeans(effect_6h_protegens,
                      pairwise ~ Trtmt_Day,
                      data = absDen_6h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_12h <- emmeans(effect_12h_protegens,
                       pairwise ~ Trtmt_Day,
                       data = absDen_12h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_24h <- emmeans(effect_24h_protegens,
                       pairwise ~ Trtmt_Day,
                       data = absDen_24h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_48h <- emmeans(effect_48h_protegens,
                       pairwise ~ Trtmt_Day,
                       data = absDen_48h)
## NOTE: Results may be misleading due to involvement in interactions
# a function that extracts the confidence intervals from a post-hoc object *WITHOUT* protegens
get_posthoc_NOprot <- function(posthoc_object, heat_trtmt) {
  output <- multcomp::cld(posthoc_object, alpha=0.05/4, Letters = letters) %>%
              data.frame() %>%
                select(-df)
  colnames(output)[2:6] <- c("est", "SE", "loCI", "hiCI", "groups")
  output$Heat <- heat_trtmt
  return(output)
}

# create a data.frame for plotting marginal effect sizes using a forest plot with the group labels
productivity_effects <- data.frame()
productivity_effects <- rbind(productivity_effects,
                              get_posthoc_NOprot(posthoc_6h, heat_trtmt = 6),
                              get_posthoc_NOprot(posthoc_12h, heat_trtmt = 12),
                              get_posthoc_NOprot(posthoc_24h, heat_trtmt = 24),
                              get_posthoc_NOprot(posthoc_48h, heat_trtmt = 48))

# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
productivity_effects$Trtmt_Day <- factor(productivity_effects$Trtmt_Day,
                                         levels = c("resist", "recov_1", "recov_2"))
levels(productivity_effects$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")

# plot
ggplot(productivity_effects,
       aes(x = est, y = as.factor(Heat), colour = Trtmt_Day)) +
  geom_vline(xintercept = 0, colour="darkgrey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbarh(position = position_dodge(width = 0.5),
                 aes(xmin = loCI, xmax = hiCI), height = 0.1) +
  geom_text(position = position_dodge(width = 0.5),
            aes(x=-0.009, label=groups)) +
  #scale_x_continuous(breaks=c(-0.006, -0.003, 0), limits=c(-0.01, 0.003)) + 
  scale_colour_manual(values=trtmt_pal) +
  labs(x = "Effect Size on Total Density",
       y = "Heat duration (hrs)",
       title = "Averaged across protegens (with extinct reps)")

Okay so it seems that adding the community growth rate as a predictor kinda changes everything. We still see decoupling between resistance and recovery but now the time frame of it varies depending on community composition. For the communities without protegens, decoupling happens sooner (e.g., between resistance and early recovery at 12h), is maximized sooner (at 24h), and then disappears completely by 48h (i.e., because of extinction). On the other hand, we already saw in the extinction analysis above that communities with protegens are protected from extinction for all heat pulse durations investigated here. For these communities, decoupling only begins to happen much later (at 48h). Although these results are more complex than what I had previously reported, they make more sense.

Simpler model: Calculate effect sizes conditional on protegens

Recall that we are NOT interested in reporting the results from the most complex model. Are the effect sizes of the preferred, less complex model (“*prot +mu + prot*CommRich”) consistent with those above?

# plot the effect size contingent on protegens
effect_6h_protegens <- eff_size(emmeans(absDen_mods6h[["*prot +mu + prot*CommRich"]], ~ Heat | Trtmt_Day*protegens + community_expected_mu + protegens*CommRich, data = absDen_6h),
                                sigma(absDen_mods6h[["*prot +mu + prot*CommRich"]]),
                                edf = df.residual(absDen_mods6h[["*prot +mu + prot*CommRich"]]))
effect_12h_protegens <- eff_size(emmeans(absDen_mods12h[["*prot +mu + prot*CommRich"]], ~ Heat | Trtmt_Day*protegens + community_expected_mu + protegens*CommRich, data = absDen_12h),
                                sigma(absDen_mods12h[["*prot +mu + prot*CommRich"]]),
                                edf = df.residual(absDen_mods12h[["*prot +mu + prot*CommRich"]]))
effect_24h_protegens <- eff_size(emmeans(absDen_mods24h[["*prot +mu + prot*CommRich"]], ~ Heat | Trtmt_Day*protegens + community_expected_mu + protegens*CommRich, data = absDen_24h),
                                sigma(absDen_mods24h[["*prot +mu + prot*CommRich"]]),
                                edf = df.residual(absDen_mods24h[["*prot +mu + prot*CommRich"]]))
effect_48h_protegens <- eff_size(emmeans(absDen_mods48h[["*prot +mu + prot*CommRich"]], ~ Heat | Trtmt_Day*protegens + community_expected_mu + protegens*CommRich, data = absDen_48h),
                                sigma(absDen_mods48h[["*prot +mu + prot*CommRich"]]),
                                edf = df.residual(absDen_mods48h[["*prot +mu + prot*CommRich"]]))

# a function that extracts the confidence intervals from eff_size contingent on protegens
get_effsize_CIs <- function(eff_size_object, heat_trtmt) {
  data.frame(Heat = heat_trtmt,
             CommRich = confint(eff_size_object)[[5]],
             Trtmt_Day = confint(eff_size_object)[[2]],
             protegens = confint(eff_size_object)[[3]],
             community_expected_mu = confint(eff_size_object)[[4]],
             effect_est = confint(eff_size_object)[[6]], #[[5]],
             effect_loCI = confint(eff_size_object)[[9]], #[[8]],
             effect_hiCI = confint(eff_size_object)[[10]]) #[[9]])
}

# create a data.frame for plotting marginal effect sizes using a forest plot
productivity_protegens <- data.frame()
productivity_protegens <- rbind(productivity_protegens,
                              get_effsize_CIs(effect_6h_protegens, heat_trtmt = 6),
                              get_effsize_CIs(effect_12h_protegens, heat_trtmt = 12),
                              get_effsize_CIs(effect_24h_protegens, heat_trtmt = 24),
                              get_effsize_CIs(effect_48h_protegens, heat_trtmt = 48))
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
productivity_protegens$Trtmt_Day <- factor(productivity_protegens$Trtmt_Day,
                                         levels = c("resist", "recov_1", "recov_2"))
levels(productivity_protegens$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")

#plot
ggplot(productivity_protegens,
       aes(x = effect_est, y = as.factor(Heat), colour = Trtmt_Day, shape = protegens)) +
  geom_vline(xintercept = 0, colour="darkgrey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbarh(position = position_dodge(width = 0.5),
                 aes(xmin = effect_loCI, xmax = effect_hiCI), height = 0.1) +
  scale_colour_manual(values=trtmt_pal) +
  labs(x = "Effect Size on Total Density",
       y = "Heat duration (hrs)",
       shape = "protegens\npresent?",
       title = "*prot +mu + prot*CommRich")

# we can do a posthoc on this to illustrate statistically significant effects
posthocPROT_6h <- emmeans(effect_6h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_6h)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_12h <- emmeans(effect_12h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_12h)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_24h <- emmeans(effect_24h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_24h)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_48h <- emmeans(effect_48h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_48h)
## NOTE: Results may be misleading due to involvement in interactions
# a function that extracts the confidence intervals from a post-hoc object *WITH* protegens
get_posthoc_YESprot <- function(posthoc_object, heat_trtmt) {
  output <- multcomp::cld(posthoc_object, alpha=0.05/4, Letters = letters) %>%
              data.frame() %>%
                select(-df)
  colnames(output)[3:7] <- c("est", "SE", "loCI", "hiCI", "groups")
  output$Heat <- heat_trtmt
  return(output)
}

# create a data.frame for plotting
prod_effects_protegens <- data.frame()
prod_effects_protegens <- rbind(prod_effects_protegens,
                               get_posthoc_YESprot(posthocPROT_6h, heat_trtmt = 6),
                               get_posthoc_YESprot(posthocPROT_12h, heat_trtmt = 12),
                               get_posthoc_YESprot(posthocPROT_24h, heat_trtmt = 24),
                               get_posthoc_YESprot(posthocPROT_48h, heat_trtmt = 48))

## note that for the decoupling plots I am using a Bonferroni-corrected alpha.
# let's get those confidence intervals because we will need them to plot decoupling below:
posthocPROT_6h_WIDER <- emmeans(effect_6h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_6h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_12h_WIDER <- emmeans(effect_12h_protegens, pairwise ~ Trtmt_Day*protegens, data = posthocPROT_12h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_24h_WIDER <- emmeans(effect_24h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_24h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_48h_WIDER <- emmeans(effect_48h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_48h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
# put these wider CIs into a table
widerCIs <- data.frame()
widerCIs <- rbind(widerCIs,
                  get_posthoc_YESprot(posthocPROT_6h_WIDER, heat_trtmt = 6),
                  get_posthoc_YESprot(posthocPROT_12h_WIDER, heat_trtmt = 12),
                  get_posthoc_YESprot(posthocPROT_24h_WIDER, heat_trtmt = 24),
                  get_posthoc_YESprot(posthocPROT_48h_WIDER, heat_trtmt = 48))
# rename the columns to remind us that this is the Bonferroni corrected alpha
colnames(widerCIs)[5:6] <- c("loCI_bonAlpha", "hiCI_bonAlpha")
# combine the wider CIs with the effect sizes
prod_effects_protegens <- inner_join(prod_effects_protegens,
                                     widerCIs %>% select(-SE))
## Joining with `by = join_by(Trtmt_Day, protegens, est, groups, Heat)`
rm(widerCIs)


# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
prod_effects_protegens$Trtmt_Day <- factor(prod_effects_protegens$Trtmt_Day,
                                          levels = c("resist", "recov_1", "recov_2"))
levels(prod_effects_protegens$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")

# plot with group labels
ggplot(prod_effects_protegens,
       aes(x = est, y = as.factor(Heat), colour = Trtmt_Day, shape=protegens)) +
  facet_grid( ~ protegens) +
  geom_vline(xintercept = 0, colour="darkgrey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbarh(position = position_dodge(width = 0.5),
                 aes(xmin = loCI, xmax = hiCI), height = 0.1) +
  geom_text(position = position_dodge(width = 0.5),
            aes(x=-0.0035, label=groups)) +
  scale_colour_manual(values=trtmt_pal) +
  labs(x = "Effect Size on Total Density",
       y="Heat duration",
       shape = "protegens\npresent?",
       title = "*prot +mu + prot*CommRich")

Decoupling conditional on protegens

###########################
# first define some functions
###########################

# a function that takes the multcomp::cld letters from 2 groups and returns TRUE when no letters are shared (or FALSE when any letter is shared)
are_groups_different <- function(group1, group2) {
  # convert the groups columns into TRUE/FALSE columns indicating significant difference between resistance and recovery effect sizes
  first_group <- group1 %>%
                  # remove any white space
                  str_trim() %>%
                    # split the string up into single characters
                    strsplit(x=., split = character(0))
  second_group <- group2 %>%
                  # remove any white space
                  str_trim() %>%
                    # split the string up into single letters
                    strsplit(x=., split = character(0))
  # test if any letters are common. If there are, then they are NOT different so return FALSE (and vice versa).
  return( !any(first_group[[1]] %in% second_group[[1]]) )
}

# a function to calculate distance from the point (x, y) to the line y = x: positive values are ABOVE the line and negative values are BELOW the line.
# this is used to calculate decoupling
dist_to_xyline <- function(x, y) {
  (y - x) / sqrt(2)  # distance formula derived from y = x line
}

# a function to estimate mean decoupling and its confidence intervals given mean and SYMMETRIC confidence intervals for resistance and recovery.
# Note that I can use the univariate confidence intervals only by assuming there's no correlation between resistance and recovery (which is exactly the opposite of the whole point of coupling)
# ...also, beware the the CI's come from a posthoc so they are more conservative that the real CI's...
estimate_decoupling <- function(resist_est, resist_hiCI,
                                recov_est, recov_hiCI) {
  # check the input values
  if(resist_hiCI < resist_est)
    stop("`resist_hiCI` must be the *UPPER* confidence interval on resistance.")
  if(recov_hiCI < recov_est)
    stop("`recov_hiCI` must be the *UPPER* confidence interval on recovery.")
  
  # get the co-ordinates that define the ellipse
  x0 <- resist_est # x-coordinate of the center of the ellipse
  y0 <- recov_est # y-coordinate of the center of the ellipse
  a <- resist_hiCI - resist_est # semi-major axis: horizontal radius
  b <- recov_hiCI - recov_est # semi-major axis: vertical radius
  
  # generate points on the perimeter of the ellipse
  theta <- seq(0, 2 * pi, length.out = 360)  # angles
  x_ellipse <- x0 + a * cos(theta)  # x-coordinates on the ellipse
  y_ellipse <- y0 + b * sin(theta)   # y-coordinates on the ellipse
  
  # decoupling measures the distance between the point and the y=x line
  mean <- dist_to_xyline(x0, y0)
  
  # do the same for all points on the ellipse defining the CI
  distances <- dist_to_xyline(x_ellipse, y_ellipse)
  
  # maximum and minimum distances define the hiCI and loCI, respectively
  hiCI <- max(distances)
  loCI <- min(distances)
  
  return(c(est_decoupling = mean, loCI_decoupling = loCI, hiCI_decoupling = hiCI))
}
# positive values are ABOVE the y=x line and negative values are BELOW the y=x line


#############################
# Decoupling
#############################

decoupling_productivity <- prod_effects_protegens %>% select(-loCI, -hiCI)
# keep just the Bonferroni-corrected (wider) confidence intervals
decoupling_productivity <- decoupling_productivity %>%
                              rename(loCI = loCI_bonAlpha,
                                     hiCI = hiCI_bonAlpha)
# for easier coding, rename the levels of Trtmt_Day
levels(decoupling_productivity$Trtmt_Day) <- c("resist", "early_recov", "late_recov")

# create data.frame for plotting
decoupling_productivity <- decoupling_productivity %>%
                              pivot_wider(names_from = Trtmt_Day,
                                          values_from = c(est, loCI, hiCI, SE, groups))

# columns that indicate if resistance is significantly different from recovery
decoupling_productivity$early_recov_VS_resist <- mapply(are_groups_different,
                                                        decoupling_productivity$groups_early_recov,
                                                        decoupling_productivity$groups_resist)
decoupling_productivity$late_recov_VS_resist <- mapply(are_groups_different,
                                                       decoupling_productivity$groups_late_recov,
                                                       decoupling_productivity$groups_resist)
# clean up extra columns
decoupling_productivity <- decoupling_productivity %>% select(-groups_resist, -groups_early_recov, -groups_late_recov)


# first plot the decoupling on early recovery
ggplot(decoupling_productivity,
       aes(x = est_resist, y = est_early_recov, colour = as.factor(Heat))) +
  facet_grid(~protegens) +
  geom_hline(yintercept = 0, colour="grey") +
  geom_vline(xintercept = 0, colour="grey") +
  geom_abline(slope = 1) +
  geom_point(shape=21, size=3, aes(fill=as.factor(early_recov_VS_resist))) +
  geom_errorbarh(aes(xmin = loCI_resist, xmax = hiCI_resist), height=0) +
  geom_errorbar(aes(ymin = loCI_early_recov, ymax = hiCI_early_recov), width=0) +
  # center the plot on 0,0:
  scale_x_continuous(limits = c(-0.008, 0.008), expand = c(0, 0)) +
  scale_y_continuous(limits = c(-0.0045, 0.0045), expand = c(0, 0)) +
  scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
  scale_fill_manual(values=c("white", "black")) +
  labs(title = "Decoupling of productivity (with extinct reps)",
       x = "Resistance +/- 95% CI",
       y = "Early Recovery +/- 95% CI",
       colour = "Heat\nDuration",
       fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_errorbarh()`).

# here's another way to plot it where the confidence intervals are shown as ellipses:
ggplot(decoupling_productivity,
       aes(x = est_resist, y = est_early_recov, colour = as.factor(Heat))) +
    facet_grid(~protegens) +
    geom_hline(yintercept = 0, colour="grey") +
    geom_vline(xintercept = 0, colour="grey") +
    geom_abline(slope = 1) +
    geom_point(shape=21, size=3, aes(fill=as.factor(early_recov_VS_resist))) +
    scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
    scale_fill_manual(values=c("white", "black")) +
    geom_ellipse(aes(x0 = est_resist,
                     y0 = est_early_recov,
                     # radius on x direction:
                     a = hiCI_resist - est_resist,
                     # radius on y direction:
                     b = hiCI_early_recov - est_early_recov,
                     angle = 0)) +
    labs(title = "Decoupling of productivity (with extinct reps)",
         x = "Resistance +/- 95% CI",
         y = "Early Recovery +/- 95% CI",
         colour = "Heat\nDuration",
         fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")

# next plot the decoupling on later recovery
ggplot(decoupling_productivity,
       aes(x = est_resist, y = est_late_recov, colour = as.factor(Heat))) +
  facet_grid(~protegens) +
  geom_hline(yintercept = 0, colour="grey") +
  geom_vline(xintercept = 0, colour="grey") +
  geom_abline(slope = 1) +
  geom_point(shape=21, size=3, aes(fill=as.factor(late_recov_VS_resist))) +
  geom_errorbarh(aes(xmin = loCI_resist, xmax = hiCI_resist), height=0) +
  geom_errorbar(aes(ymin = loCI_late_recov, ymax = hiCI_late_recov), width=0) +
  # center the plot on 0,0:
  scale_x_continuous(limits = c(-0.008, 0.008), expand = c(0, 0)) +
  scale_y_continuous(limits = c(-0.0036, 0.0036), expand = c(0, 0)) +
  scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
  scale_fill_manual(values=c("white", "black")) +
  labs(title = "Decoupling of productivity (with extinct reps)",
       x = "Resistance +/- 95% CI",
       y = "Late Recovery +/- 95% CI",
       colour = "Heat\nDuration",
       fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_errorbarh()`).

# late recovery with CI plotted as ellipses:
ggplot(decoupling_productivity,
       aes(x = est_resist, y = est_late_recov, colour = as.factor(Heat))) +
    facet_grid(~protegens) +
    geom_hline(yintercept = 0, colour="grey") +
    geom_vline(xintercept = 0, colour="grey") +
    geom_abline(slope = 1) +
    geom_point(shape=21, size=3, aes(fill=as.factor(late_recov_VS_resist))) +
    scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
    scale_fill_manual(values=c("white", "black")) +
    geom_ellipse(aes(x0 = est_resist,
                     y0 = est_late_recov,
                     # radius on x direction:
                     a = hiCI_resist - est_resist,
                     # radius on y direction:
                     b = hiCI_late_recov - est_late_recov,
                     angle = 0)) +
    labs(title = "Decoupling of productivity (with extinct reps)",
         x = "Resistance +/- 95% CI",
         y = "Late Recovery +/- 95% CI",
         colour = "Heat\nDuration",
         fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")

# finally estimate decoupling by getting the distance to the y=x line
# calculate decoupling between resistance and early recovery
early_decoupling <- t(with(decoupling_productivity,
                           mapply(estimate_decoupling,
                                  resist_est = est_resist,
                                  resist_hiCI = hiCI_resist,
                                  recov_est = est_early_recov,
                                  recov_hiCI = hiCI_early_recov)))
# add annotation
early_decoupling <- cbind(decoupling_productivity[,1:2],
                          early_decoupling)

ggplot(early_decoupling,
       aes(x = as.factor(Heat), y = est_decoupling)) +
  facet_grid(~protegens) +
  geom_hline(yintercept = 0, colour = "grey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbar(position = position_dodge(width = 0.5),
                aes(ymin = loCI_decoupling, ymax = hiCI_decoupling),
                alpha=0.4, width=0.1) +
  labs(title = "Early recovery (WITH extinct reps)",
       y = "Decoupling +/- 95% CI",
       x = "Heat Duration (hrs)")

# calculate decoupling between resistance and late recovery
late_decoupling <- t(with(decoupling_productivity,
                           mapply(estimate_decoupling,
                                  resist_est = est_resist,
                                  resist_hiCI = hiCI_resist,
                                  recov_est = est_late_recov,
                                  recov_hiCI = hiCI_late_recov)))
# add annotation
late_decoupling <- cbind(decoupling_productivity[,1:2],
                          late_decoupling)

ggplot(late_decoupling,
       aes(x = as.factor(Heat), y = est_decoupling)) +
  facet_grid(~protegens) +
  geom_hline(yintercept = 0, colour = "grey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbar(position = position_dodge(width = 0.5),
                aes(ymin = loCI_decoupling, ymax = hiCI_decoupling),
                alpha=0.4, width=0.1) +
  scale_colour_viridis_d(option = "viridis", end=0.85) +
  labs(title = "Late recovery (WITH extinct reps)",
       y = "Decoupling +/- 95% CI",
       x = "Heat Duration (hrs)")

Simpler model: Calculate the effect sizes marginal on protegens

##############################
# effect sizes with protegens as non-focal
##############################
# But we are not interested in the details of protegens. Let's do the post-hoc again now averaging across the effects of protegens.

posthoc_6h <- emmeans(effect_6h_protegens, pairwise ~ Trtmt_Day, data = absDen_6h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_12h <- emmeans(effect_12h_protegens, pairwise ~ Trtmt_Day, data = absDen_12h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_24h <- emmeans(effect_24h_protegens, pairwise ~ Trtmt_Day, data = absDen_24h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_48h <- emmeans(effect_48h_protegens, pairwise ~ Trtmt_Day, data = absDen_48h)
## NOTE: Results may be misleading due to involvement in interactions
# create a data.frame for plotting marginal effect sizes using a forest plot with the group labels
productivity_effects <- data.frame()
productivity_effects <- rbind(productivity_effects,
                              get_posthoc_NOprot(posthoc_6h, heat_trtmt = 6),
                              get_posthoc_NOprot(posthoc_12h, heat_trtmt = 12),
                              get_posthoc_NOprot(posthoc_24h, heat_trtmt = 24),
                              get_posthoc_NOprot(posthoc_48h, heat_trtmt = 48))


## note that for the decoupling plots I am using a Bonferroni-corrected alpha.
# let's get those confidence intervals because we will need them to plot decoupling below:
posthoc_6h_WIDER <- emmeans(effect_6h_protegens, pairwise ~ Trtmt_Day, data = absDen_6h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_12h_WIDER <- emmeans(effect_12h_protegens, pairwise ~ Trtmt_Day, data = absDen_12h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_24h_WIDER <- emmeans(effect_24h_protegens, pairwise ~ Trtmt_Day, data = absDen_24h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_48h_WIDER <- emmeans(effect_48h_protegens, pairwise ~ Trtmt_Day, data = absDen_48h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
# put these wider CIs into a table
widerCIs <- data.frame()
widerCIs <- rbind(widerCIs,
                  get_posthoc_NOprot(posthoc_6h_WIDER, heat_trtmt = 6),
                  get_posthoc_NOprot(posthoc_12h_WIDER, heat_trtmt = 12),
                  get_posthoc_NOprot(posthoc_24h_WIDER, heat_trtmt = 24),
                  get_posthoc_NOprot(posthoc_48h_WIDER, heat_trtmt = 48))
# rename the columns to remind us that this is the Bonferroni corrected alpha
colnames(widerCIs)[4:5] <- c("loCI_bonAlpha", "hiCI_bonAlpha")
# combine the wider CIs with the effect sizes
productivity_effects <- inner_join(productivity_effects,
                                   widerCIs %>% select(-SE))
## Joining with `by = join_by(Trtmt_Day, est, groups, Heat)`
rm(widerCIs)

# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
productivity_effects$Trtmt_Day <- factor(productivity_effects$Trtmt_Day,
                                         levels = c("resist", "recov_1", "recov_2"))
levels(productivity_effects$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")

# plot with group labels
ggplot(productivity_effects,
       aes(x = est, y = as.factor(Heat), colour = Trtmt_Day)) +
  geom_vline(xintercept = 0, colour="darkgrey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbarh(position = position_dodge(width = 0.5),
                 aes(xmin = loCI, xmax = hiCI), height = 0.1) +
  geom_text(position = position_dodge(width = 0.5),
            aes(x=-0.008, label=groups)) +
  scale_colour_manual(values=trtmt_pal) +
  labs(x = "Effect Size on Total Density",
       y = "Heat duration (hrs)",
       title = "protegens as non-focal predictor (i.e., marginalized)")

#######
# finally, we will do a series of pairwise two-tailed t-tests to compare between heat durations
#######

# estimate the sample sizes
temp <- productivity_effects # copy the effects to temp
productivity_effects <- rbind(temp %>% filter(Heat == 6) %>% mutate(n = estimate_n(absDen_6h)),
                     temp %>% filter(Heat == 12) %>% mutate(n = estimate_n(absDen_12h)),
                     temp %>% filter(Heat == 24) %>% mutate(n = estimate_n(absDen_24h)),
                     temp %>% filter(Heat == 48) %>% mutate(n = estimate_n(absDen_48h)))
rm(temp)

# estimate the SD from the SE
productivity_effects <- productivity_effects %>% mutate(SD = SE * sqrt(n)) %>%
    # re-order by Heat and Trtmt_Day
                          arrange(Heat, Trtmt_Day)

# all pairwise combinations of comparisons between the same treatment day for different durations
temp <- t(combn(c(1,4,7,10), 2))
combos <- rbind(temp, temp+1, temp+2)
rm(temp)
# loop through all the combinations and do the t-tests
prodEffects_ttests <- data.frame()
for(i in 1:nrow(combos)){
  prodEffects_ttests <- rbind(prodEffects_ttests,
                             run_ttest(row_x = combos[i,1],
                                       row_y = combos[i,2],
                                       summary_stats_df = productivity_effects))
}
prodEffects_ttests$adjusted_p <- p.adjust(prodEffects_ttests$pvalue, method = "bonferroni")
prodEffects_ttests$Trtmt_Day <- productivity_effects$Trtmt_Day[combos[,1]]
prodEffects_ttests$Heat_1 <- productivity_effects$Heat[combos[,1]]
prodEffects_ttests$Heat_2 <- productivity_effects$Heat[combos[,2]]

print(prodEffects_ttests)
##     t_statistic       df       pvalue   adjusted_p      Trtmt_Day Heat_1 Heat_2
## t   10.47327399 30.75149 1.154061e-11 2.077310e-10     Resistance      6     12
## t1  21.17673146 28.53041 5.417906e-19 9.752231e-18     Resistance      6     24
## t2  32.89317973 18.11665 1.306276e-17 2.351297e-16     Resistance      6     48
## t3   6.85581717 22.61200 5.966686e-07 1.074004e-05     Resistance     12     24
## t4  27.60250117 19.01350 8.356449e-17 1.504161e-15     Resistance     12     48
## t5  26.05248055 16.16656 1.238834e-14 2.229901e-13     Resistance     24     48
## t6  -0.07249429 32.00521 9.426597e-01 1.000000e+00 Early Recovery      6     12
## t7   8.26687963 27.56078 6.068705e-09 1.092367e-07 Early Recovery      6     24
## t8  40.91143292 28.29271 1.011875e-26 1.821374e-25 Early Recovery      6     48
## t9   8.34414769 23.60875 1.687811e-08 3.038061e-07 Early Recovery     12     24
## t10 40.94532357 27.33141 4.530209e-26 8.154376e-25 Early Recovery     12     48
## t11 40.28429928 20.10302 1.059894e-20 1.907809e-19 Early Recovery     24     48
## t12 14.19702257 31.87762 2.481735e-15 4.467122e-14  Late Recovery      6     12
## t13 10.92229280 24.24178 7.582814e-11 1.364907e-09  Late Recovery      6     24
## t14 35.63914918 29.27681 1.139762e-25 2.051572e-24  Late Recovery      6     48
## t15 -7.44066706 20.33961 3.153933e-07 5.677079e-06  Late Recovery     12     24
## t16 23.04010331 28.32362 6.877370e-20 1.237927e-18  Late Recovery     12     48
## t17 33.76923521 18.60434 3.733642e-18 6.720555e-17  Late Recovery     24     48
# these p-values seem overly optimistic. Use alpha = 1*10^-3

Awesome! The effect sizes of the simpler model are indeed consistent with those from the complex model. We will focus on the parameter estimates and effect sizes from the simpler model.

Decoupling marginal on protegens

decoupling_productivity <- productivity_effects %>% select(-loCI, -hiCI)
# keep just the Bonferroni-corrected (wider) confidence intervals
decoupling_productivity <- decoupling_productivity %>%
                              rename(loCI = loCI_bonAlpha,
                                     hiCI = hiCI_bonAlpha)
# for easier coding, rename the levels of Trtmt_Day
levels(decoupling_productivity$Trtmt_Day) <- c("resist", "early_recov", "late_recov")

# create data.frame for plotting
decoupling_productivity <- decoupling_productivity %>%
                            select(-n, -SD) %>%
                              pivot_wider(names_from = Trtmt_Day,
                                          values_from = c(est, loCI, hiCI, SE, groups))

# columns that indicate if resistance is significantly different from recovery
decoupling_productivity$early_recov_VS_resist <- mapply(are_groups_different,
                                                        decoupling_productivity$groups_early_recov,
                                                        decoupling_productivity$groups_resist)
decoupling_productivity$late_recov_VS_resist <- mapply(are_groups_different,
                                                       decoupling_productivity$groups_late_recov,
                                                       decoupling_productivity$groups_resist)
# clean up extra columns
decoupling_productivity <- decoupling_productivity %>% select(-groups_resist, -groups_early_recov, -groups_late_recov)


# first plot the decoupling on early recovery
ggplot(decoupling_productivity,
       aes(x = est_resist, y = est_early_recov, colour = as.factor(Heat))) +
  #facet_grid(~CommRich) +
  geom_hline(yintercept = 0, colour="grey") +
  geom_vline(xintercept = 0, colour="grey") +
  geom_abline(slope = 1) +
  geom_point(shape=21, size=3, aes(fill=as.factor(early_recov_VS_resist))) +
  geom_errorbarh(aes(xmin = loCI_resist, xmax = hiCI_resist), height=0) +
  geom_errorbar(aes(ymin = loCI_early_recov, ymax = hiCI_early_recov), width=0) +
  # center the plot on 0,0:
  scale_x_continuous(limits = c(-0.008, 0.008), expand = c(0, 0)) +
  scale_y_continuous(limits = c(-0.0045, 0.0045), expand = c(0, 0)) +
  scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
  scale_fill_manual(values=c("white", "black")) +
  labs(title = "Decoupling of productivity (with extinct reps)",
       x = "Resistance +/- 95% CI",
       y = "Early Recovery +/- 95% CI",
       colour = "Heat\nDuration",
       fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")

# here's another way to plot it where the confidence intervals are shown as ellipses:
ggplot(decoupling_productivity,
       aes(x = est_resist, y = est_early_recov, colour = as.factor(Heat))) +
    geom_hline(yintercept = 0, colour="grey") +
    geom_vline(xintercept = 0, colour="grey") +
    geom_abline(slope = 1) +
    geom_point(shape=21, size=3, aes(fill=as.factor(early_recov_VS_resist))) +
    scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
    scale_fill_manual(values=c("white", "black")) +
    geom_ellipse(aes(x0 = est_resist,
                     y0 = est_early_recov,
                     # radius on x direction:
                     a = hiCI_resist - est_resist,
                     # radius on y direction:
                     b = hiCI_early_recov - est_early_recov,
                     angle = 0)) +
    labs(title = "Decoupling of productivity (with extinct reps)",
         x = "Resistance +/- 95% CI",
         y = "Early Recovery +/- 95% CI",
         colour = "Heat\nDuration",
         fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")

# next plot the decoupling on later recovery
ggplot(decoupling_productivity,
       aes(x = est_resist, y = est_late_recov, colour = as.factor(Heat))) +
  geom_hline(yintercept = 0, colour="grey") +
  geom_vline(xintercept = 0, colour="grey") +
  geom_abline(slope = 1) +
  geom_point(shape=21, size=3, aes(fill=as.factor(late_recov_VS_resist))) +
  geom_errorbarh(aes(xmin = loCI_resist, xmax = hiCI_resist), height=0) +
  geom_errorbar(aes(ymin = loCI_late_recov, ymax = hiCI_late_recov), width=0) +
  # center the plot on 0,0:
  scale_x_continuous(limits = c(-0.008, 0.008), expand = c(0, 0)) +
  scale_y_continuous(limits = c(-0.0036, 0.0036), expand = c(0, 0)) +
  scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
  scale_fill_manual(values=c("white", "black")) +
  labs(title = "Decoupling of productivity (with extinct reps)",
       x = "Resistance +/- 95% CI",
       y = "Late Recovery +/- 95% CI",
       colour = "Heat\nDuration",
       fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")

# late recovery with CI plotted as ellipses:
ggplot(decoupling_productivity,
       aes(x = est_resist, y = est_late_recov, colour = as.factor(Heat))) +
    geom_hline(yintercept = 0, colour="grey") +
    geom_vline(xintercept = 0, colour="grey") +
    geom_abline(slope = 1) +
    geom_point(shape=21, size=3, aes(fill=as.factor(late_recov_VS_resist))) +
    scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
    scale_fill_manual(values=c("white", "black")) +
    geom_ellipse(aes(x0 = est_resist,
                     y0 = est_late_recov,
                     # radius on x direction:
                     a = hiCI_resist - est_resist,
                     # radius on y direction:
                     b = hiCI_late_recov - est_late_recov,
                     angle = 0)) +
    labs(title = "Decoupling of productivity (with extinct reps)",
         x = "Resistance +/- 95% CI",
         y = "Late Recovery +/- 95% CI",
         colour = "Heat\nDuration",
         fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")

# finally estimate decoupling by getting the distance to the y=x line
# calculate decoupling between resistance and early recovery
early_decoupling <- t(with(decoupling_productivity,
                           mapply(estimate_decoupling,
                                  resist_est = est_resist,
                                  resist_hiCI = hiCI_resist,
                                  recov_est = est_early_recov,
                                  recov_hiCI = hiCI_early_recov)))
# add annotation
early_decoupling <- cbind(decoupling_productivity[,1:2],
                          early_decoupling)

ggplot(early_decoupling,
       aes(x = as.factor(Heat), y = est_decoupling)) +
  geom_hline(yintercept = 0, colour = "grey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbar(position = position_dodge(width = 0.5),
                aes(ymin = loCI_decoupling, ymax = hiCI_decoupling),
                alpha=0.4, width=0.1) +
  labs(title = "Early recovery (WITH extinct reps)",
       y = "Decoupling +/- 95% CI",
       x = "Heat Duration (hrs)")

# calculate decoupling between resistance and late recovery
late_decoupling <- t(with(decoupling_productivity,
                           mapply(estimate_decoupling,
                                  resist_est = est_resist,
                                  resist_hiCI = hiCI_resist,
                                  recov_est = est_late_recov,
                                  recov_hiCI = hiCI_late_recov)))
# add annotation
late_decoupling <- cbind(decoupling_productivity[,1:2],
                          late_decoupling)

ggplot(late_decoupling,
       aes(x = as.factor(Heat), y = est_decoupling)) +
  geom_hline(yintercept = 0, colour = "grey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbar(position = position_dodge(width = 0.5),
                aes(ymin = loCI_decoupling, ymax = hiCI_decoupling),
                alpha=0.4, width=0.1) +
  scale_colour_viridis_d(option = "viridis", end=0.85) +
  labs(title = "Late recovery (WITH extinct reps)",
       y = "Decoupling +/- 95% CI",
       x = "Heat Duration (hrs)")

# clean up
rm(absDen_6h, absDen_12h, absDen_24h, absDen_48h, absDen_mods6h, absDen_mods12h, absDen_mods24h, absDen_mods48h,
   combos, decoupling_productivity, early_decoupling, late_decoupling,
   effect_6h, effect_12h, effect_24h, effect_48h, effect_6h_protegens, effect_12h_protegens, effect_24h_protegens, effect_48h_protegens,
   posthoc_6h, posthoc_12h, posthoc_24h, posthoc_48h, posthoc_6h_WIDER, posthoc_12h_WIDER, posthoc_24h_WIDER, posthoc_48h_WIDER, posthocPROT_6h, posthocPROT_12h, posthocPROT_24h, posthocPROT_48h, posthocPROT_6h_WIDER, posthocPROT_12h_WIDER, posthocPROT_24h_WIDER, posthocPROT_48h_WIDER,
   prod_effects_protegens, prodEffects_ttests, productivity_effects, productivity_protegens)

Repeat analysis removing extinct replicates

Let’s remove the extinct replicates to focus just on the data where the communities survived.

# add a column indicating whether the replicate survived
  # but first we need to remove $Heat because it's a factor for diversity but numeric for extinctions and cannot be *_joined
tmp_div <- absDen_forFit %>% select(-Heat)
tmp_div <- inner_join(tmp_div,
                      extinct.df %>% select(uniqID, survived),
                      by = c("uniqID"))
absDen_forFit$survived <- tmp_div$survived
rm(tmp_div)

# keep just the diversity values that did not go extinct
absDen_forFit <- absDen_forFit %>% filter(survived == 1)

####################
# 6h heat duration
####################
# grab just the treatment with its associated control data
absDen_6h <- rbind(absDen_forFit %>% filter(Heat == "6"),
                   absDen_forFit %>% filter(Heat == "control", Day < 4))
# create a column for last day of heat, first day of recovery, and last day of recovery
absDen_6h$Trtmt_Day <- "resist"
absDen_6h$Trtmt_Day[absDen_6h$Day == 2] <- "recov_1"
absDen_6h$Trtmt_Day[absDen_6h$Day == 3] <- "recov_2"
# appropriately distinguish between numbers and factors
absDen_6h$Trtmt_Day <- as.factor(absDen_6h$Trtmt_Day)
absDen_6h$Heat <- droplevels(absDen_6h$Heat)
absDen_6h$resistant <- as.factor(absDen_6h$resistant)
absDen_6h$protegens <- as.factor(absDen_6h$protegens)

# fit different models:
absDen_mods6h <- fit_productivity_models(absDen_6h)

# check the simplest possible models for multicolinearity
check_collinearity(absDen_mods6h[["simple"]])
check_collinearity(absDen_mods6h[["simple resist"]])
# print a summary table of the model fits
data.frame(pars = sapply(absDen_mods6h, npar_of_glmmTMB_fit),
                           AIC = sapply(absDen_mods6h, AIC),
                           AICc = sapply(absDen_mods6h, AICc),
                           BIC = sapply(absDen_mods6h, BIC)) %>%
                    mutate(dAIC = min(AIC)-AIC,
                           dAICc = min(AICc)-AICc,
                           dBIC = min(BIC)-BIC) %>% arrange(BIC)
# plot the best model for 6h:
print(plot_model_pred.MU(mod_list=absDen_mods6h, mod_name="*mu +prot"))

# plot the best model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods6h, mod_name="*prot*mu +CommRich"))

# plot the preferred model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods6h, mod_name="*prot +mu + prot*CommRich"))

# check the fit and estimates of the best model for the complete data:
#simulateResiduals(fittedModel = absDen_mods6h[["*prot*mu +CommRich"]], plot = TRUE)
#summary(absDen_mods6h[["*prot*mu +CommRich"]])
# check the fit and estimates of the preferred model for the complete data:
simulateResiduals(fittedModel = absDen_mods6h[["*prot*mu +CommRich"]], plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.624 0.96 0.724 0.6761063 0.992 0.872 0.584 0.02 0.916 0.4455416 0.964 1 0.752 0.4 0.864 0.4146115 0.112 0.5655875 0.5192903 0.572 ...
summary(absDen_mods6h[["*prot +mu + prot*CommRich"]])
##  Family: genpois  ( log )
## Formula:          
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens +  
##     community_expected_mu + protegens * CommRich
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##   5662.5   5726.0  -2815.2   5630.5      376 
## 
## 
## Dispersion parameter for genpois family ():  368 
## 
## Conditional model:
##                                          Estimate Std. Error z value Pr(>|z|)
## (Intercept)                              7.839618   0.228850   34.26  < 2e-16
## Heatcontrol                             -0.180861   0.118842   -1.52   0.1280
## Trtmt_Dayrecov_2                        -0.018307   0.097866   -0.19   0.8516
## Trtmt_Dayresist                         -0.008151   0.101418   -0.08   0.9359
## protegens1                              -1.304007   0.181747   -7.17 7.24e-13
## community_expected_mu                   -0.491750   0.214502   -2.29   0.0219
## CommRich                                -0.003971   0.044970   -0.09   0.9296
## Heatcontrol:Trtmt_Dayrecov_2            -0.137528   0.167598   -0.82   0.4119
## Heatcontrol:Trtmt_Dayresist              0.174712   0.164347    1.06   0.2878
## Heatcontrol:protegens1                  -0.061370   0.180549   -0.34   0.7339
## Trtmt_Dayrecov_2:protegens1             -0.333568   0.165252   -2.02   0.0435
## Trtmt_Dayresist:protegens1               0.061134   0.164067    0.37   0.7094
## protegens1:CommRich                      0.032276   0.064557    0.50   0.6171
## Heatcontrol:Trtmt_Dayrecov_2:protegens1  0.075049   0.261004    0.29   0.7737
## Heatcontrol:Trtmt_Dayresist:protegens1   0.251918   0.250033    1.01   0.3137
##                                            
## (Intercept)                             ***
## Heatcontrol                                
## Trtmt_Dayrecov_2                           
## Trtmt_Dayresist                            
## protegens1                              ***
## community_expected_mu                   *  
## CommRich                                   
## Heatcontrol:Trtmt_Dayrecov_2               
## Heatcontrol:Trtmt_Dayresist                
## Heatcontrol:protegens1                     
## Trtmt_Dayrecov_2:protegens1             *  
## Trtmt_Dayresist:protegens1                 
## protegens1:CommRich                        
## Heatcontrol:Trtmt_Dayrecov_2:protegens1    
## Heatcontrol:Trtmt_Dayresist:protegens1     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################
# 12h heat duration
####################
# grab just the treatment with its associated control data
absDen_12h <- rbind(absDen_forFit %>% filter(Heat == "12", Day > 1),
                       absDen_forFit %>% filter(Heat == "control", Day > 1, Day != 5))
# create a column for last day of heat, first day of recovery, and last day of recovery
absDen_12h$Trtmt_Day <- "resist"
absDen_12h$Trtmt_Day[absDen_12h$Day == 3] <- "recov_1"
absDen_12h$Trtmt_Day[absDen_12h$Day == 4] <- "recov_2"
# appropriately distinguish between numbers and factors
absDen_12h$Trtmt_Day <- as.factor(absDen_12h$Trtmt_Day)
absDen_12h$Heat <- droplevels(absDen_12h$Heat)
absDen_12h$resistant <- as.factor(absDen_12h$resistant)
absDen_12h$protegens <- as.factor(absDen_12h$protegens)

# fit different models:
absDen_mods12h <- fit_productivity_models(absDen_12h)

# check the simplest possible models for multicolinearity
check_collinearity(absDen_mods12h[["simple"]])
check_collinearity(absDen_mods12h[["simple resist"]])
# print a summary table of the model fits
data.frame(pars = sapply(absDen_mods12h, npar_of_glmmTMB_fit),
                           AIC = sapply(absDen_mods12h, AIC),
                           AICc = sapply(absDen_mods12h, AICc),
                           BIC = sapply(absDen_mods12h, BIC)) %>%
                    mutate(dAIC = min(AIC)-AIC,
                           dAICc = min(AICc)-AICc,
                           dBIC = min(BIC)-BIC) %>% arrange(BIC)
# plot the best model for 12h:
print(plot_model_pred.MU(mod_list=absDen_mods12h, mod_name="*mu +prot"))

# plot the best model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods12h, mod_name="*prot*mu +CommRich"))

# plot the preferred model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods12h, mod_name="*prot +mu + prot*CommRich"))

# check the fit and estimates of the best model for the complete data:
#simulateResiduals(fittedModel = absDen_mods12h[["*prot*mu +CommRich"]], plot = TRUE)
#summary(absDen_mods12h[["*prot*mu +CommRich"]])
# check the fit and estimates of the preferred model for the complete data:
simulateResiduals(fittedModel = absDen_mods12h[["*prot*mu +CommRich"]], plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.876 0.632 0.2287389 0.686251 0.2893171 0.468 0.08077386 0.628 0.82 0.42 0.672 0.708 0.532 0.4667235 0.568 0.584 0.4163186 0.3632073 0.5082269 0.92 ...
summary(absDen_mods12h[["*prot +mu + prot*CommRich"]])
##  Family: genpois  ( log )
## Formula:          
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens +  
##     community_expected_mu + protegens * CommRich
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##   4935.9   4997.8  -2452.0   4903.9      337 
## 
## 
## Dispersion parameter for genpois family ():  446 
## 
## Conditional model:
##                                         Estimate Std. Error z value Pr(>|z|)
## (Intercept)                              7.01699    0.32692  21.464  < 2e-16
## Heatcontrol                             -0.21263    0.13850  -1.535   0.1247
## Trtmt_Dayrecov_2                        -0.37577    0.14055  -2.674   0.0075
## Trtmt_Dayresist                         -0.86993    0.16953  -5.131 2.88e-07
## protegens1                              -1.25584    0.23759  -5.286 1.25e-07
## community_expected_mu                    0.07319    0.29296   0.250   0.8027
## CommRich                                 0.11143    0.07060   1.578   0.1145
## Heatcontrol:Trtmt_Dayrecov_2             0.51493    0.20295   2.537   0.0112
## Heatcontrol:Trtmt_Dayresist              1.11031    0.21161   5.247 1.55e-07
## Heatcontrol:protegens1                  -0.09314    0.20466  -0.455   0.6490
## Trtmt_Dayrecov_2:protegens1              0.18835    0.20302   0.928   0.3535
## Trtmt_Dayresist:protegens1               0.89448    0.21920   4.081 4.49e-05
## protegens1:CommRich                     -0.08730    0.08660  -1.008   0.3134
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 -0.17894    0.29796  -0.601   0.5481
## Heatcontrol:Trtmt_Dayresist:protegens1  -0.71900    0.29616  -2.428   0.0152
##                                            
## (Intercept)                             ***
## Heatcontrol                                
## Trtmt_Dayrecov_2                        ** 
## Trtmt_Dayresist                         ***
## protegens1                              ***
## community_expected_mu                      
## CommRich                                   
## Heatcontrol:Trtmt_Dayrecov_2            *  
## Heatcontrol:Trtmt_Dayresist             ***
## Heatcontrol:protegens1                     
## Trtmt_Dayrecov_2:protegens1                
## Trtmt_Dayresist:protegens1              ***
## protegens1:CommRich                        
## Heatcontrol:Trtmt_Dayrecov_2:protegens1    
## Heatcontrol:Trtmt_Dayresist:protegens1  *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################
# 24h heat duration
####################
# grab just the treatment with its associated control data
absDen_24h <- rbind(absDen_forFit %>% filter(Heat == "24", Day > 1),
                       absDen_forFit %>% filter(Heat == "control", Day > 1, Day != 5))
# create a column for last day of heat, first day of recovery, and last day of recovery
absDen_24h$Trtmt_Day <- "resist"
absDen_24h$Trtmt_Day[absDen_24h$Day == 3] <- "recov_1"
absDen_24h$Trtmt_Day[absDen_24h$Day == 4] <- "recov_2"
# appropriately distinguish between numbers and factors
absDen_24h$Trtmt_Day <- as.factor(absDen_24h$Trtmt_Day)
absDen_24h$Heat <- droplevels(absDen_24h$Heat)
absDen_24h$resistant <- as.factor(absDen_24h$resistant)
absDen_24h$protegens <- as.factor(absDen_24h$protegens)

# fit different models:
absDen_mods24h <- fit_productivity_models(absDen_24h)

# check the simplest possible models for multicolinearity
check_collinearity(absDen_mods24h[["simple"]])
check_collinearity(absDen_mods24h[["simple resist"]])
# print a summary table of the model fits
data.frame(pars = sapply(absDen_mods24h, npar_of_glmmTMB_fit),
                           AIC = sapply(absDen_mods24h, AIC),
                           AICc = sapply(absDen_mods24h, AICc),
                           BIC = sapply(absDen_mods24h, BIC)) %>%
                    mutate(dAIC = min(AIC)-AIC,
                           dAICc = min(AICc)-AICc,
                           dBIC = min(BIC)-BIC) %>% arrange(BIC)
# plot the best model for the 24h data without extinctions:
print(plot_model_pred.CommRich(mod_list=absDen_mods24h, mod_name="*prot*mu +CommRich"))

# plot the best model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods24h, mod_name="*prot*mu +CommRich"))

# plot the preferred model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods24h, mod_name="*prot +mu + prot*CommRich"))

# check the fit and estimates of the best model for the complete data:
#simulateResiduals(fittedModel = absDen_mods24h[["*prot*mu +CommRich"]], plot = TRUE)
#summary(absDen_mods24h[["*prot*mu +CommRich"]])
# check the fit and estimates of the preferred model for the complete data:
simulateResiduals(fittedModel = absDen_mods24h[["*prot +mu + prot*CommRich"]], plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.5483252 0.384947 0.332 0.6092959 0.688 0.5308852 0.464 0.2199125 0.2281527 0.2535313 0.7 0.752 0.5366702 0.144 0.2 0.2025676 0.4616551 0.284 0.076 0.2775874 ...
summary(absDen_mods24h[["*prot +mu + prot*CommRich"]])
##  Family: genpois  ( log )
## Formula:          
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens +  
##     community_expected_mu + protegens * CommRich
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##   4891.5   4953.2  -2429.8   4859.5      334 
## 
## 
## Dispersion parameter for genpois family ():  769 
## 
## Conditional model:
##                                         Estimate Std. Error z value Pr(>|z|)
## (Intercept)                              4.71414    0.31230  15.095  < 2e-16
## Heatcontrol                              1.18929    0.24211   4.912 9.01e-07
## Trtmt_Dayrecov_2                         2.04329    0.21341   9.574  < 2e-16
## Trtmt_Dayresist                         -1.58085    0.27065  -5.841 5.19e-09
## protegens1                               0.98405    0.24890   3.954 7.70e-05
## community_expected_mu                    0.70145    0.32696   2.145 0.031924
## CommRich                                 0.32409    0.07376   4.394 1.11e-05
## Heatcontrol:Trtmt_Dayrecov_2            -1.86530    0.27428  -6.801 1.04e-11
## Heatcontrol:Trtmt_Dayresist              1.92680    0.33187   5.806 6.40e-09
## Heatcontrol:protegens1                  -1.74930    0.28922  -6.048 1.46e-09
## Trtmt_Dayrecov_2:protegens1             -2.47075    0.26515  -9.318  < 2e-16
## Trtmt_Dayresist:protegens1               1.10425    0.30938   3.569 0.000358
## protegens1:CommRich                     -0.31366    0.08945  -3.507 0.000454
## Heatcontrol:Trtmt_Dayrecov_2:protegens1  2.44064    0.35658   6.845 7.67e-12
## Heatcontrol:Trtmt_Dayresist:protegens1  -1.04441    0.39442  -2.648 0.008098
##                                            
## (Intercept)                             ***
## Heatcontrol                             ***
## Trtmt_Dayrecov_2                        ***
## Trtmt_Dayresist                         ***
## protegens1                              ***
## community_expected_mu                   *  
## CommRich                                ***
## Heatcontrol:Trtmt_Dayrecov_2            ***
## Heatcontrol:Trtmt_Dayresist             ***
## Heatcontrol:protegens1                  ***
## Trtmt_Dayrecov_2:protegens1             ***
## Trtmt_Dayresist:protegens1              ***
## protegens1:CommRich                     ***
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 ***
## Heatcontrol:Trtmt_Dayresist:protegens1  ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################
# 48h heat duration
####################
# grab just the treatment with its associated control data
absDen_48h <- rbind(absDen_forFit %>% filter(Heat == "48", Day > 2),
                       absDen_forFit %>% filter(Heat == "control", Day > 2))
# create a column for last day of heat, first day of recovery, and last day of recovery
absDen_48h$Trtmt_Day <- "resist"
absDen_48h$Trtmt_Day[absDen_48h$Day == 4] <- "recov_1"
absDen_48h$Trtmt_Day[absDen_48h$Day == 5] <- "recov_2"
# appropriately distinguish between numbers and factors
absDen_48h$Trtmt_Day <- as.factor(absDen_48h$Trtmt_Day)
absDen_48h$Heat <- droplevels(absDen_48h$Heat)
absDen_48h$resistant <- as.factor(absDen_48h$resistant)
absDen_48h$protegens <- as.factor(absDen_48h$protegens)

# fit different models:
absDen_mods48h <- fit_productivity_models(absDen_48h)

# check the simplest possible models for multicolinearity
check_collinearity(absDen_mods48h[["simple"]])
check_collinearity(absDen_mods48h[["simple resist"]])
# print a summary table of the model fits
data.frame(pars = sapply(absDen_mods48h, npar_of_glmmTMB_fit),
                           AIC = sapply(absDen_mods48h, AIC),
                           AICc = sapply(absDen_mods48h, AICc),
                           BIC = sapply(absDen_mods48h, BIC)) %>%
                    mutate(dAIC = min(AIC)-AIC,
                           dAICc = min(AICc)-AICc,
                           dBIC = min(BIC)-BIC) %>% arrange(BIC)
# plot the best model for 48h:
print(plot_model_pred.CommRich(mod_list=absDen_mods48h, mod_name="+CommRich +prot"))
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.

# plot the best model for the complete data:
print(plot_model_pred.MU(mod_list=absDen_mods48h, mod_name="*prot*mu +CommRich"))
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.

# plot the preferred model for the complete data:
print(plot_model_pred.CommRich(mod_list=absDen_mods48h, mod_name="*prot +mu + prot*CommRich"))
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.

# check the fit and estimates of the best model for the complete data:
#simulateResiduals(fittedModel = absDen_mods48h[["*prot*mu +CommRich"]], plot = TRUE)
#summary(absDen_mods48h[["*prot*mu +CommRich"]])
# check the fit and estimates of the preferred model for the complete data:
simulateResiduals(fittedModel = absDen_mods48h[["*prot +mu + prot*CommRich"]], plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.6503375 0.6930755 0.8098628 0.6608687 0.6388946 0.7922922 0.6865314 0.540222 0.5938006 0.5679162 0.5248042 0.005801423 0.1950582 0.3731587 0.3880367 0.1832986 0.3625339 0.5414847 0.3098432 0.731147 ...
summary(absDen_mods48h[["*prot +mu + prot*CommRich"]])
##  Family: genpois  ( log )
## Formula:          
## as.integer(TotDensity_scale * 1000) ~ Heat * Trtmt_Day * protegens +  
##     community_expected_mu + protegens * CommRich
## Data: data_subset
## 
##      AIC      BIC   logLik deviance df.resid 
##   3482.6   3541.0  -1725.3   3450.6      269 
## 
## 
## Dispersion parameter for genpois family ():  273 
## 
## Conditional model:
##                                         Estimate Std. Error z value Pr(>|z|)
## (Intercept)                              7.87541    0.36970  21.302  < 2e-16
## Heatcontrol                             -0.05482    0.15455  -0.355 0.722804
## Trtmt_Dayrecov_2                        -0.21690    0.17239  -1.258 0.208332
## Trtmt_Dayresist                         -4.78317    0.70978  -6.739 1.60e-11
## protegens1                              -1.57374    0.25504  -6.171 6.80e-10
## community_expected_mu                   -0.71331    0.31059  -2.297 0.021640
## CommRich                                -0.04159    0.07091  -0.586 0.557584
## Heatcontrol:Trtmt_Dayrecov_2            -0.80020    0.22933  -3.489 0.000484
## Heatcontrol:Trtmt_Dayresist              4.68262    0.72165   6.489 8.65e-11
## Heatcontrol:protegens1                  -0.07258    0.21600  -0.336 0.736868
## Trtmt_Dayrecov_2:protegens1             -0.01591    0.22686  -0.070 0.944073
## Trtmt_Dayresist:protegens1               1.36098    0.77106   1.765 0.077552
## protegens1:CommRich                      0.04398    0.08902   0.494 0.621299
## Heatcontrol:Trtmt_Dayrecov_2:protegens1  0.80333    0.31485   2.551 0.010727
## Heatcontrol:Trtmt_Dayresist:protegens1  -1.40448    0.79761  -1.761 0.078263
##                                            
## (Intercept)                             ***
## Heatcontrol                                
## Trtmt_Dayrecov_2                           
## Trtmt_Dayresist                         ***
## protegens1                              ***
## community_expected_mu                   *  
## CommRich                                   
## Heatcontrol:Trtmt_Dayrecov_2            ***
## Heatcontrol:Trtmt_Dayresist             ***
## Heatcontrol:protegens1                     
## Trtmt_Dayrecov_2:protegens1                
## Trtmt_Dayresist:protegens1              .  
## protegens1:CommRich                        
## Heatcontrol:Trtmt_Dayrecov_2:protegens1 *  
## Heatcontrol:Trtmt_Dayresist:protegens1  .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
####################
# Select the best model across all data subsets
####################
# for each information criterion, get the average across all data subsets
meanIC <- data.frame(pars = sapply(absDen_mods48h, npar_of_glmmTMB_fit),
                     AIC = sapply(absDen_mods6h, AIC) + sapply(absDen_mods12h, AIC) + sapply(absDen_mods24h, AIC) + sapply(absDen_mods48h, AIC),
                     AICc = sapply(absDen_mods6h, AICc) + sapply(absDen_mods12h, AICc) + sapply(absDen_mods24h, AICc) + sapply(absDen_mods48h, AICc),
                     BIC = sapply(absDen_mods6h, BIC) + sapply(absDen_mods12h, BIC) + sapply(absDen_mods24h, BIC) + sapply(absDen_mods48h, BIC)) %>%
            mutate(AIC = AIC/4,
                   AICc = AICc/4,
                   BIC = BIC/4) %>%
              mutate(dAIC = min(AIC)-AIC,
                     dAICc = min(AICc)-AICc,
                     dBIC = min(BIC)-BIC)
meanIC %>% arrange(BIC)
meanIC %>% arrange(AIC)
# clean up
rm(meanIC)

Annoyingly enough, the best model as well as the preferred model are changed now that we consider the data without any extinctions…

Welp, I’m going to still use the “*prot +mu + prot*CommRich” model. There’s a different result now for late recovery at 48h: a positive effect of heat duration. When I tried to do the downstream analysis with the model preferred for the extinction data, ““, the results were the same. So this seems to be describing a trend that’s present in the data and not an issue with the model selection itself.

Calculate effect sizes conditional on protegens (No extinctions)

# plot the effect size contingent on protegens
  # use the same model as above, "*prot +mu + prot*CommRich"
effect_6h_protegens <- eff_size(emmeans(absDen_mods6h[["*prot +mu + prot*CommRich"]], ~ Heat | Trtmt_Day*protegens + community_expected_mu + protegens*CommRich, data = absDen_6h),
                                sigma(absDen_mods6h[["*prot +mu + prot*CommRich"]]),
                                edf = df.residual(absDen_mods6h[["*prot +mu + prot*CommRich"]]))
effect_12h_protegens <- eff_size(emmeans(absDen_mods12h[["*prot +mu + prot*CommRich"]], ~ Heat | Trtmt_Day*protegens + community_expected_mu + protegens*CommRich, data = absDen_12h),
                                sigma(absDen_mods12h[["*prot +mu + prot*CommRich"]]),
                                edf = df.residual(absDen_mods12h[["*prot +mu + prot*CommRich"]]))
effect_24h_protegens <- eff_size(emmeans(absDen_mods24h[["*prot +mu + prot*CommRich"]], ~ Heat | Trtmt_Day*protegens + community_expected_mu + protegens*CommRich, data = absDen_24h),
                                sigma(absDen_mods24h[["*prot +mu + prot*CommRich"]]),
                                edf = df.residual(absDen_mods24h[["*prot +mu + prot*CommRich"]]))
effect_48h_protegens <- eff_size(emmeans(absDen_mods48h[["*prot +mu + prot*CommRich"]], ~ Heat | Trtmt_Day*protegens + community_expected_mu + protegens*CommRich, data = absDen_48h),
                                sigma(absDen_mods48h[["*prot +mu + prot*CommRich"]]),
                                edf = df.residual(absDen_mods48h[["*prot +mu + prot*CommRich"]]))


# use the overall preferred model for the data excluding extinctions:
#effect_6h_protegens <- eff_size(emmeans(absDen_mods6h[["*prot +CommRich"]], ~ Heat | Trtmt_Day*protegens + CommRich, data = absDen_6h),
#                                sigma(absDen_mods6h[["*prot +CommRich"]]),
#                                edf = df.residual(absDen_mods6h[["*prot +CommRich"]]))
#effect_12h_protegens <- eff_size(emmeans(absDen_mods12h[["*prot +CommRich"]], ~ Heat | #Trtmt_Day*protegens + CommRich, data = absDen_12h),
#                                sigma(absDen_mods12h[["*prot +CommRich"]]),
#                                edf = df.residual(absDen_mods12h[["*prot +CommRich"]]))
#effect_24h_protegens <- eff_size(emmeans(absDen_mods24h[["*prot +CommRich"]], ~ Heat | #Trtmt_Day*protegens + CommRich, data = absDen_24h),
#                                sigma(absDen_mods24h[["*prot +CommRich"]]),
#                                edf = df.residual(absDen_mods24h[["*prot +CommRich"]]))
#effect_48h_protegens <- eff_size(emmeans(absDen_mods48h[["*prot +CommRich"]], ~ Heat | Trtmt_Day*protegens + CommRich, data = absDen_48h),
#                                sigma(absDen_mods48h[["*prot +CommRich"]]),
#                                edf = df.residual(absDen_mods48h[["*prot +CommRich"]]))

# a function that extracts the confidence intervals from eff_size contingent on protegens
get_effsize_CIs <- function(eff_size_object, heat_trtmt) {
  data.frame(Heat = heat_trtmt,
             # this function will fail with "*prot +CommRich"
             # so you need to modify the code as indicated in the commented out bits:
             CommRich = confint(eff_size_object)[[5]], #[[4]],
             Trtmt_Day = confint(eff_size_object)[[2]], #[[2]],
             protegens = confint(eff_size_object)[[3]],
             community_expected_mu = confint(eff_size_object)[[4]], # this whole line needs to be removed
             effect_est = confint(eff_size_object)[[5]],
             effect_loCI = confint(eff_size_object)[[8]],
             effect_hiCI = confint(eff_size_object)[[9]])
}

# create a data.frame for plotting marginal effect sizes using a forest plot
productivity_protegens <- data.frame()
productivity_protegens <- rbind(productivity_protegens,
                              get_effsize_CIs(effect_6h_protegens, heat_trtmt = 6),
                              get_effsize_CIs(effect_12h_protegens, heat_trtmt = 12),
                              get_effsize_CIs(effect_24h_protegens, heat_trtmt = 24),
                              get_effsize_CIs(effect_48h_protegens, heat_trtmt = 48))
# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
productivity_protegens$Trtmt_Day <- factor(productivity_protegens$Trtmt_Day,
                                         levels = c("resist", "recov_1", "recov_2"))
levels(productivity_protegens$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")

#plot
ggplot(productivity_protegens,
       aes(x = effect_est, y = as.factor(Heat), colour = Trtmt_Day, shape = protegens)) +
  geom_vline(xintercept = 0, colour="darkgrey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbarh(position = position_dodge(width = 0.5),
                 aes(xmin = effect_loCI, xmax = effect_hiCI), height = 0.1) +
  scale_colour_manual(values=trtmt_pal) +
  labs(x = "Effect Size on Total Density",
       y = "Heat duration (hrs)",
       shape = "protegens\npresent?",
       title = "*prot +mu + prot*CommRich")

# we can do a posthoc on this to illustrate statistically significant effects
posthocPROT_6h <- emmeans(effect_6h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_6h)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_12h <- emmeans(effect_12h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_12h)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_24h <- emmeans(effect_24h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_24h)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_48h <- emmeans(effect_48h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_48h)
## NOTE: Results may be misleading due to involvement in interactions
# create a data.frame for plotting
prod_effects_protegens <- data.frame()
prod_effects_protegens <- rbind(prod_effects_protegens,
                               get_posthoc_YESprot(posthocPROT_6h, heat_trtmt = 6),
                               get_posthoc_YESprot(posthocPROT_12h, heat_trtmt = 12),
                               get_posthoc_YESprot(posthocPROT_24h, heat_trtmt = 24),
                               get_posthoc_YESprot(posthocPROT_48h, heat_trtmt = 48))

## note that for the decoupling plots I am using a Bonferroni-corrected alpha.
# let's get those confidence intervals because we will need them to plot decoupling below:
posthocPROT_6h_WIDER <- emmeans(effect_6h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_6h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_12h_WIDER <- emmeans(effect_12h_protegens, pairwise ~ Trtmt_Day*protegens, data = posthocPROT_12h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_24h_WIDER <- emmeans(effect_24h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_24h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthocPROT_48h_WIDER <- emmeans(effect_48h_protegens, pairwise ~ Trtmt_Day*protegens, data = absDen_48h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
# put these wider CIs into a table
widerCIs <- data.frame()
widerCIs <- rbind(widerCIs,
                  get_posthoc_YESprot(posthocPROT_6h_WIDER, heat_trtmt = 6),
                  get_posthoc_YESprot(posthocPROT_12h_WIDER, heat_trtmt = 12),
                  get_posthoc_YESprot(posthocPROT_24h_WIDER, heat_trtmt = 24),
                  get_posthoc_YESprot(posthocPROT_48h_WIDER, heat_trtmt = 48))
# rename the columns to remind us that this is the Bonferroni corrected alpha
colnames(widerCIs)[5:6] <- c("loCI_bonAlpha", "hiCI_bonAlpha")
# combine the wider CIs with the effect sizes
prod_effects_protegens <- inner_join(prod_effects_protegens,
                                     widerCIs %>% select(-SE))
## Joining with `by = join_by(Trtmt_Day, protegens, est, groups, Heat)`
rm(widerCIs)


# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
prod_effects_protegens$Trtmt_Day <- factor(prod_effects_protegens$Trtmt_Day,
                                          levels = c("resist", "recov_1", "recov_2"))
levels(prod_effects_protegens$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")

# plot with group labels
ggplot(prod_effects_protegens,
       aes(x = est, y = as.factor(Heat), colour = Trtmt_Day, shape=protegens)) +
  facet_grid( ~ protegens) +
  geom_vline(xintercept = 0, colour="darkgrey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbarh(position = position_dodge(width = 0.5),
                 aes(xmin = loCI, xmax = hiCI), height = 0.1) +
  geom_text(position = position_dodge(width = 0.5),
            aes(x=-0.0035, label=groups)) +
  scale_colour_manual(values=trtmt_pal) +
  labs(x = "Effect Size on Total Density",
       y="Heat duration",
       shape = "protegens\npresent?",
       title = "*prot +mu + prot*CommRich")

Decoupling conditional on protegens (No extinctions)

decoupling_productivity <- prod_effects_protegens %>% select(-loCI, -hiCI)
# keep just the Bonferroni-corrected (wider) confidence intervals
decoupling_productivity <- decoupling_productivity %>%
                              rename(loCI = loCI_bonAlpha,
                                     hiCI = hiCI_bonAlpha)
# for easier coding, rename the levels of Trtmt_Day
levels(decoupling_productivity$Trtmt_Day) <- c("resist", "early_recov", "late_recov")

# create data.frame for plotting
decoupling_productivity <- decoupling_productivity %>%
                              pivot_wider(names_from = Trtmt_Day,
                                          values_from = c(est, loCI, hiCI, SE, groups))

# columns that indicate if resistance is significantly different from recovery
decoupling_productivity$early_recov_VS_resist <- mapply(are_groups_different,
                                                        decoupling_productivity$groups_early_recov,
                                                        decoupling_productivity$groups_resist)
decoupling_productivity$late_recov_VS_resist <- mapply(are_groups_different,
                                                       decoupling_productivity$groups_late_recov,
                                                       decoupling_productivity$groups_resist)
# clean up extra columns
decoupling_productivity <- decoupling_productivity %>% select(-groups_resist, -groups_early_recov, -groups_late_recov)


# first plot the decoupling on early recovery
ggplot(decoupling_productivity,
       aes(x = est_resist, y = est_early_recov, colour = as.factor(Heat))) +
  facet_grid(~protegens) +
  geom_hline(yintercept = 0, colour="grey") +
  geom_vline(xintercept = 0, colour="grey") +
  geom_abline(slope = 1) +
  geom_point(shape=21, size=3, aes(fill=as.factor(early_recov_VS_resist))) +
  geom_errorbarh(aes(xmin = loCI_resist, xmax = hiCI_resist), height=0) +
  geom_errorbar(aes(ymin = loCI_early_recov, ymax = hiCI_early_recov), width=0) +
  # center the plot on 0,0:
  scale_x_continuous(limits = c(-0.025, 0.025), expand = c(0, 0)) +
  scale_y_continuous(limits = c(-0.0025, 0.0025), expand = c(0, 0)) +
  scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
  scale_fill_manual(values=c("white", "black")) +
  labs(title = "Decoupling of productivity (NO extinct reps)",
       x = "Resistance +/- 95% CI",
       y = "Early Recovery +/- 95% CI",
       colour = "Heat\nDuration",
       fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")

# here's another way to plot it where the confidence intervals are shown as ellipses:
ggplot(decoupling_productivity,
       aes(x = est_resist, y = est_early_recov, colour = as.factor(Heat))) +
    facet_grid(~protegens) +
    geom_hline(yintercept = 0, colour="grey") +
    geom_vline(xintercept = 0, colour="grey") +
    geom_abline(slope = 1) +
    geom_point(shape=21, size=3, aes(fill=as.factor(early_recov_VS_resist))) +
    scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
    scale_fill_manual(values=c("white", "black")) +
    geom_ellipse(aes(x0 = est_resist,
                     y0 = est_early_recov,
                     # radius on x direction:
                     a = hiCI_resist - est_resist,
                     # radius on y direction:
                     b = hiCI_early_recov - est_early_recov,
                     angle = 0)) +
    labs(title = "Decoupling of productivity (NO extinct reps)",
         x = "Resistance +/- 95% CI",
         y = "Early Recovery +/- 95% CI",
         colour = "Heat\nDuration",
         fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")

# next plot the decoupling on later recovery
ggplot(decoupling_productivity,
       aes(x = est_resist, y = est_late_recov, colour = as.factor(Heat))) +
  facet_grid(~protegens) +
  geom_hline(yintercept = 0, colour="grey") +
  geom_vline(xintercept = 0, colour="grey") +
  geom_abline(slope = 1) +
  geom_point(shape=21, size=3, aes(fill=as.factor(late_recov_VS_resist))) +
  geom_errorbarh(aes(xmin = loCI_resist, xmax = hiCI_resist), height=0) +
  geom_errorbar(aes(ymin = loCI_late_recov, ymax = hiCI_late_recov), width=0) +
  # center the plot on 0,0:
  scale_x_continuous(limits = c(-0.025, 0.025), expand = c(0, 0)) +
  scale_y_continuous(limits = c(-0.005, 0.005), expand = c(0, 0)) +
  scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
  scale_fill_manual(values=c("white", "black")) +
  labs(title = "Decoupling of productivity (NO extinct reps)",
       x = "Resistance +/- 95% CI",
       y = "Late Recovery +/- 95% CI",
       colour = "Heat\nDuration",
       fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")

# late recovery with CI plotted as ellipses:
ggplot(decoupling_productivity,
       aes(x = est_resist, y = est_late_recov, colour = as.factor(Heat))) +
    facet_grid(~protegens) +
    geom_hline(yintercept = 0, colour="grey") +
    geom_vline(xintercept = 0, colour="grey") +
    geom_abline(slope = 1) +
    geom_point(shape=21, size=3, aes(fill=as.factor(late_recov_VS_resist))) +
    scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
    scale_fill_manual(values=c("white", "black")) +
    geom_ellipse(aes(x0 = est_resist,
                     y0 = est_late_recov,
                     # radius on x direction:
                     a = hiCI_resist - est_resist,
                     # radius on y direction:
                     b = hiCI_late_recov - est_late_recov,
                     angle = 0)) +
    labs(title = "Decoupling of productivity (NO extinct reps)",
         x = "Resistance +/- 95% CI",
         y = "Late Recovery +/- 95% CI",
         colour = "Heat\nDuration",
         fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")

# finally estimate decoupling by getting the distance to the y=x line
# calculate decoupling between resistance and early recovery
early_decoupling <- t(with(decoupling_productivity,
                           mapply(estimate_decoupling,
                                  resist_est = est_resist,
                                  resist_hiCI = hiCI_resist,
                                  recov_est = est_early_recov,
                                  recov_hiCI = hiCI_early_recov)))
# add annotation
early_decoupling <- cbind(decoupling_productivity[,1:2],
                          early_decoupling)

ggplot(early_decoupling,
       aes(x = as.factor(Heat), y = est_decoupling)) +
  facet_grid(~protegens) +
  geom_hline(yintercept = 0, colour = "grey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbar(position = position_dodge(width = 0.5),
                aes(ymin = loCI_decoupling, ymax = hiCI_decoupling),
                alpha=0.4, width=0.1) +
  labs(title = "Early recovery (NO extinct reps)",
       y = "Decoupling +/- 95% CI",
       x = "Heat Duration (hrs)")

# calculate decoupling between resistance and late recovery
late_decoupling <- t(with(decoupling_productivity,
                           mapply(estimate_decoupling,
                                  resist_est = est_resist,
                                  resist_hiCI = hiCI_resist,
                                  recov_est = est_late_recov,
                                  recov_hiCI = hiCI_late_recov)))
# add annotation
late_decoupling <- cbind(decoupling_productivity[,1:2],
                          late_decoupling)

ggplot(late_decoupling,
       aes(x = as.factor(Heat), y = est_decoupling)) +
  facet_grid(~protegens) +
  geom_hline(yintercept = 0, colour = "grey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbar(position = position_dodge(width = 0.5),
                aes(ymin = loCI_decoupling, ymax = hiCI_decoupling),
                alpha=0.4, width=0.1) +
  scale_colour_viridis_d(option = "viridis", end=0.85) +
  labs(title = "Late recovery (NO extinct reps)",
       y = "Decoupling +/- 95% CI",
       x = "Heat Duration (hrs)")

Calculate the effect sizes marginal on protegens (No extinctions)

##############################
# effect sizes with protegens as non-focal
##############################
# But we are not interested in the details of protegens. Let's do the post-hoc again now averaging across the effects of protegens.

posthoc_6h <- emmeans(effect_6h_protegens, pairwise ~ Trtmt_Day, data = absDen_6h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_12h <- emmeans(effect_12h_protegens, pairwise ~ Trtmt_Day, data = absDen_12h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_24h <- emmeans(effect_24h_protegens, pairwise ~ Trtmt_Day, data = absDen_24h)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_48h <- emmeans(effect_48h_protegens, pairwise ~ Trtmt_Day, data = absDen_48h)
## NOTE: Results may be misleading due to involvement in interactions
# create a data.frame for plotting marginal effect sizes using a forest plot with the group labels
productivity_effects <- data.frame()
productivity_effects <- rbind(productivity_effects,
                              get_posthoc_NOprot(posthoc_6h, heat_trtmt = 6),
                              get_posthoc_NOprot(posthoc_12h, heat_trtmt = 12),
                              get_posthoc_NOprot(posthoc_24h, heat_trtmt = 24),
                              get_posthoc_NOprot(posthoc_48h, heat_trtmt = 48))


## note that for the decoupling plots I am using a Bonferroni-corrected alpha.
# let's get those confidence intervals because we will need them to plot decoupling below:
posthoc_6h_WIDER <- emmeans(effect_6h_protegens, pairwise ~ Trtmt_Day, data = absDen_6h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_12h_WIDER <- emmeans(effect_12h_protegens, pairwise ~ Trtmt_Day, data = absDen_12h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_24h_WIDER <- emmeans(effect_24h_protegens, pairwise ~ Trtmt_Day, data = absDen_24h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
posthoc_48h_WIDER <- emmeans(effect_48h_protegens, pairwise ~ Trtmt_Day, data = absDen_48h, level=0.9875)
## NOTE: Results may be misleading due to involvement in interactions
# put these wider CIs into a table
widerCIs <- data.frame()
widerCIs <- rbind(widerCIs,
                  get_posthoc_NOprot(posthoc_6h_WIDER, heat_trtmt = 6),
                  get_posthoc_NOprot(posthoc_12h_WIDER, heat_trtmt = 12),
                  get_posthoc_NOprot(posthoc_24h_WIDER, heat_trtmt = 24),
                  get_posthoc_NOprot(posthoc_48h_WIDER, heat_trtmt = 48))
# rename the columns to remind us that this is the Bonferroni corrected alpha
colnames(widerCIs)[4:5] <- c("loCI_bonAlpha", "hiCI_bonAlpha")
# combine the wider CIs with the effect sizes
productivity_effects <- inner_join(productivity_effects,
                                   widerCIs %>% select(-SE))
## Joining with `by = join_by(Trtmt_Day, est, groups, Heat)`
rm(widerCIs)

# re-order the levels of Trtmt_Day to go from resistance to recovery then rename them for nice plotting
productivity_effects$Trtmt_Day <- factor(productivity_effects$Trtmt_Day,
                                         levels = c("resist", "recov_1", "recov_2"))
levels(productivity_effects$Trtmt_Day) <- c("Resistance", "Early Recovery", "Late Recovery")

# plot with group labels
ggplot(productivity_effects,
       aes(x = est, y = as.factor(Heat), colour = Trtmt_Day)) +
  geom_vline(xintercept = 0, colour="darkgrey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbarh(position = position_dodge(width = 0.5),
                 aes(xmin = loCI, xmax = hiCI), height = 0.1) +
  geom_text(position = position_dodge(width = 0.5),
            aes(x=-0.008, label=groups)) +
  scale_colour_manual(values=trtmt_pal) +
  labs(x = "Effect Size on Total Density",
       y = "Heat duration (hrs)",
       title = "protegens as non-focal predictor (i.e., marginalized)")

#######
# finally, we will do a series of pairwise two-tailed t-tests to compare between heat durations
#######

# estimate the sample sizes
temp <- productivity_effects # copy the effects to temp
productivity_effects <- rbind(temp %>% filter(Heat == 6) %>% mutate(n = estimate_n(absDen_6h)),
                     temp %>% filter(Heat == 12) %>% mutate(n = estimate_n(absDen_12h)),
                     temp %>% filter(Heat == 24) %>% mutate(n = estimate_n(absDen_24h)),
                     temp %>% filter(Heat == 48) %>% mutate(n = estimate_n(absDen_48h)))
rm(temp)

# estimate the SD from the SE
productivity_effects <- productivity_effects %>% mutate(SD = SE * sqrt(n)) %>%
    # re-order by Heat and Trtmt_Day
                          arrange(Heat, Trtmt_Day)

# all pairwise combinations of comparisons between the same treatment day for different durations
temp <- t(combn(c(1,4,7,10), 2))
combos <- rbind(temp, temp+1, temp+2)
rm(temp)
# loop through all the combinations and do the t-tests
prodEffects_ttests <- data.frame()
for(i in 1:nrow(combos)){
  prodEffects_ttests <- rbind(prodEffects_ttests,
                             run_ttest(row_x = combos[i,1],
                                       row_y = combos[i,2],
                                       summary_stats_df = productivity_effects))
}
prodEffects_ttests$adjusted_p <- p.adjust(prodEffects_ttests$pvalue, method = "bonferroni")
prodEffects_ttests$Trtmt_Day <- productivity_effects$Trtmt_Day[combos[,1]]
prodEffects_ttests$Heat_1 <- productivity_effects$Heat[combos[,1]]
prodEffects_ttests$Heat_2 <- productivity_effects$Heat[combos[,2]]

print(prodEffects_ttests)
##      t_statistic       df       pvalue   adjusted_p      Trtmt_Day Heat_1
## t    10.47327399 30.75149 1.154061e-11 2.077310e-10     Resistance      6
## t1   27.52904820 32.45710 4.475729e-24 8.056312e-23     Resistance      6
## t2   30.95878975 11.06125 4.243667e-12 7.638601e-11     Resistance      6
## t3   14.44861398 27.96645 1.694667e-14 3.050400e-13     Resistance     12
## t4   28.99508330 11.15798 7.358535e-12 1.324536e-10     Resistance     12
## t5   26.59511042 10.99768 2.476558e-11 4.457805e-10     Resistance     24
## t6   -0.07249429 32.00521 9.426597e-01 1.000000e+00 Early Recovery      6
## t7   13.22879424 31.94888 1.654272e-14 2.977690e-13 Early Recovery      6
## t8    1.84987737 16.06394 8.281448e-02 1.000000e+00 Early Recovery      6
## t9   13.29354339 28.39949 1.019381e-13 1.834887e-12 Early Recovery     12
## t10   1.89468118 15.97684 7.638020e-02 1.000000e+00 Early Recovery     12
## t11  -5.90091191 14.21475 3.628640e-05 6.531553e-04 Early Recovery     24
## t12  14.19702257 31.87762 2.481735e-15 4.467122e-14  Late Recovery      6
## t13   5.97944421 27.94791 1.948392e-06 3.507105e-05  Late Recovery      6
## t14  -6.79305127 15.70073 4.770715e-06 8.587286e-05  Late Recovery      6
## t15 -11.29482419 23.70829 5.042086e-11 9.075755e-10  Late Recovery     12
## t16 -15.47112543 15.82356 5.674596e-11 1.021427e-09  Late Recovery     12
## t17 -10.43356335 12.48399 1.585933e-07 2.854680e-06  Late Recovery     24
##     Heat_2
## t       12
## t1      24
## t2      48
## t3      24
## t4      48
## t5      48
## t6      12
## t7      24
## t8      48
## t9      24
## t10     48
## t11     48
## t12     12
## t13     24
## t14     48
## t15     24
## t16     48
## t17     48
# these p-values seem overly optimistic. Use alpha = 1*10^-3


################################
# Plot figure for main text: Figure 4b
################################
png(filename="./figures/Fig4B_plot.png", width = 4.48, height = 2.61, units = "in", res=300)
ggplot(productivity_effects,
       aes(x = est, y = as.factor(Heat), colour = Trtmt_Day)) +
  geom_vline(xintercept = 0, colour="darkgrey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbarh(position = position_dodge(width = 0.5),
                 aes(xmin = loCI, xmax = hiCI), height = 0.15) +
  scale_colour_manual(values=trtmt_pal) +
  labs(x = "Effect Size on Productivity",
       y="Heat Duration (hrs)",
       colour = "Treatment\nDay") +
  theme(legend.position="none")
dev.off()
## png 
##   2

Decoupling marginal on protegens (No extinctions)

decoupling_productivity <- productivity_effects %>% select(-loCI, -hiCI)
# keep just the Bonferroni-corrected (wider) confidence intervals
decoupling_productivity <- decoupling_productivity %>%
                              rename(loCI = loCI_bonAlpha,
                                     hiCI = hiCI_bonAlpha)
# for easier coding, rename the levels of Trtmt_Day
levels(decoupling_productivity$Trtmt_Day) <- c("resist", "early_recov", "late_recov")

# create data.frame for plotting
decoupling_productivity <- decoupling_productivity %>%
                            select(-n, -SD) %>%
                              pivot_wider(names_from = Trtmt_Day,
                                          values_from = c(est, loCI, hiCI, SE, groups))

# columns that indicate if resistance is significantly different from recovery
decoupling_productivity$early_recov_VS_resist <- mapply(are_groups_different,
                                                        decoupling_productivity$groups_early_recov,
                                                        decoupling_productivity$groups_resist)
decoupling_productivity$late_recov_VS_resist <- mapply(are_groups_different,
                                                       decoupling_productivity$groups_late_recov,
                                                       decoupling_productivity$groups_resist)
# clean up extra columns
decoupling_productivity <- decoupling_productivity %>% select(-groups_resist, -groups_early_recov, -groups_late_recov)


# first plot the decoupling on early recovery
ggplot(decoupling_productivity,
       aes(x = est_resist, y = est_early_recov, colour = as.factor(Heat))) +
  #facet_grid(~CommRich) +
  geom_hline(yintercept = 0, colour="grey") +
  geom_vline(xintercept = 0, colour="grey") +
  geom_abline(slope = 1) +
  geom_point(shape=21, size=3, aes(fill=as.factor(early_recov_VS_resist))) +
  geom_errorbarh(aes(xmin = loCI_resist, xmax = hiCI_resist), height=0) +
  geom_errorbar(aes(ymin = loCI_early_recov, ymax = hiCI_early_recov), width=0) +
  # center the plot on 0,0:
  scale_x_continuous(limits = c(-0.019, 0.019), expand = c(0, 0)) +
  scale_y_continuous(limits = c(-0.0014, 0.0014), expand = c(0, 0)) +
  scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
  scale_fill_manual(values=c("white", "black")) +
  labs(title = "Decoupling of productivity (NO extinct reps)",
       x = "Resistance +/- 95% CI",
       y = "Early Recovery +/- 95% CI",
       colour = "Heat\nDuration",
       fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")

# here's another way to plot it where the confidence intervals are shown as ellipses:
ggplot(decoupling_productivity,
       aes(x = est_resist, y = est_early_recov, colour = as.factor(Heat))) +
    geom_hline(yintercept = 0, colour="grey") +
    geom_vline(xintercept = 0, colour="grey") +
    geom_abline(slope = 1) +
    geom_point(shape=21, size=3, aes(fill=as.factor(early_recov_VS_resist))) +
    scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
    scale_fill_manual(values=c("white", "black")) +
    geom_ellipse(aes(x0 = est_resist,
                     y0 = est_early_recov,
                     # radius on x direction:
                     a = hiCI_resist - est_resist,
                     # radius on y direction:
                     b = hiCI_early_recov - est_early_recov,
                     angle = 0)) +
    labs(title = "Decoupling of productivity (NO extinct reps)",
         x = "Resistance +/- 95% CI",
         y = "Early Recovery +/- 95% CI",
         colour = "Heat\nDuration",
         fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")

# next plot the decoupling on later recovery
fig5a <- ggplot(decoupling_productivity,
            aes(x = est_resist, y = est_late_recov, colour = as.factor(Heat))) +
          geom_hline(yintercept = 0, colour="grey") +
          geom_vline(xintercept = 0, colour="grey") +
          geom_abline(slope = 1) +
          geom_point(shape=21, size=2, aes(fill=as.factor(late_recov_VS_resist))) +
          geom_errorbarh(aes(xmin = loCI_resist, xmax = hiCI_resist), height=0) +
          geom_errorbar(aes(ymin = loCI_late_recov, ymax = hiCI_late_recov), width=0) +
          # center the plot on 0,0:
          scale_x_continuous(limits = c(-0.019, 0.019), expand = c(0, 0)) +
          scale_y_continuous(limits = c(-0.003, 0.003), expand = c(0, 0)) +
          scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
          scale_fill_manual(values=c("white", "black")) +
          labs(x = "Resistance",
               y = "Late Recovery",
               colour = "Heat\nDuration",
               fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")

print(fig5a + labs(title = "Decoupling of productivity (NO extinct reps)"))

################################
# Plot figure for main text: Figure 5a
################################
png(filename="./figures/Fig5A.png", width = 6.25, height = 3.68, units = "in", res=300)
print(fig5a)
dev.off()
## png 
##   2
# late recovery with CI plotted as ellipses:
ggplot(decoupling_productivity,
       aes(x = est_resist, y = est_late_recov, colour = as.factor(Heat))) +
    geom_hline(yintercept = 0, colour="grey") +
    geom_vline(xintercept = 0, colour="grey") +
    geom_abline(slope = 1) +
    geom_point(shape=21, size=3, aes(fill=as.factor(late_recov_VS_resist))) +
    scale_colour_viridis_d(option = "plasma", begin=0.2, end = 0.9) +
    scale_fill_manual(values=c("white", "black")) +
    geom_ellipse(aes(x0 = est_resist,
                     y0 = est_late_recov,
                     # radius on x direction:
                     a = hiCI_resist - est_resist,
                     # radius on y direction:
                     b = hiCI_late_recov - est_late_recov,
                     angle = 0)) +
    labs(title = "Decoupling of productivity (NO extinct reps)",
         x = "Resistance +/- 95% CI",
         y = "Late Recovery +/- 95% CI",
         colour = "Heat\nDuration",
         fill="Resistance\nvs. Recovery\nSignificantly\nDifferent?")

# finally estimate decoupling by getting the distance to the y=x line
# calculate decoupling between resistance and early recovery
early_decoupling <- t(with(decoupling_productivity,
                           mapply(estimate_decoupling,
                                  resist_est = est_resist,
                                  resist_hiCI = hiCI_resist,
                                  recov_est = est_early_recov,
                                  recov_hiCI = hiCI_early_recov)))
# add annotation
early_decoupling <- cbind(decoupling_productivity[,1:2],
                          early_decoupling)

ggplot(early_decoupling,
       aes(x = as.factor(Heat), y = est_decoupling)) +
  geom_hline(yintercept = 0, colour = "grey") +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbar(position = position_dodge(width = 0.5),
                aes(ymin = loCI_decoupling, ymax = hiCI_decoupling),
                alpha=0.4, width=0.1) +
  labs(title = "Early recovery (NO extinct reps)",
       y = "Decoupling +/- 95% CI",
       x = "Heat Duration (hrs)")

# calculate decoupling between resistance and late recovery
late_decoupling <- t(with(decoupling_productivity,
                           mapply(estimate_decoupling,
                                  resist_est = est_resist,
                                  resist_hiCI = hiCI_resist,
                                  recov_est = est_late_recov,
                                  recov_hiCI = hiCI_late_recov)))
# add annotation
late_decoupling <- cbind(decoupling_productivity[,c(1:2, 15)],
                          late_decoupling)

# plot for main text:
fig5b <- ggplot(late_decoupling,
                aes(x = as.factor(Heat), y = est_decoupling)) +
          geom_hline(yintercept = 0, colour = "grey") +
          geom_point(position = position_dodge(width = 0.5)) +
          geom_errorbar(aes(ymin = loCI_decoupling, ymax = hiCI_decoupling),
                        width=0.05) +
          labs(y = "Decoupling",
               x = "Heat Duration (hrs)")
print(fig5b + labs(title = "Late recovery (NO extinct reps)"))

################################
# Plot figure for main text: Figure 5b
################################
png(filename="./figures/Fig5B.png", width = 4.7, height = 2.0, units = "in", res=300)
print(fig5b)
dev.off()
## png 
##   2
# clean up
rm(absDen_6h, absDen_12h, absDen_24h, absDen_48h, absDen_mods6h, absDen_mods12h, absDen_mods24h, absDen_mods48h,
   combos, decoupling_productivity, early_decoupling, late_decoupling,
   effect_6h, effect_12h, effect_24h, effect_48h, effect_6h_protegens, effect_12h_protegens, effect_24h_protegens, effect_48h_protegens,
   posthoc_6h, posthoc_12h, posthoc_24h, posthoc_48h, posthoc_6h_WIDER, posthoc_12h_WIDER, posthoc_24h_WIDER, posthoc_48h_WIDER, posthocPROT_6h, posthocPROT_12h, posthocPROT_24h, posthocPROT_48h, posthocPROT_6h_WIDER, posthocPROT_12h_WIDER, posthocPROT_24h_WIDER, posthocPROT_48h_WIDER,
   prod_effects_protegens, prodEffects_ttests, productivity_effects, productivity_protegens,
 fig5a, fig5b)
## Warning in rm(absDen_6h, absDen_12h, absDen_24h, absDen_48h, absDen_mods6h, :
## object 'effect_6h' not found
## Warning in rm(absDen_6h, absDen_12h, absDen_24h, absDen_48h, absDen_mods6h, :
## object 'effect_12h' not found
## Warning in rm(absDen_6h, absDen_12h, absDen_24h, absDen_48h, absDen_mods6h, :
## object 'effect_24h' not found
## Warning in rm(absDen_6h, absDen_12h, absDen_24h, absDen_48h, absDen_mods6h, :
## object 'effect_48h' not found